Google Gemini: Multimodal AI for Next-Generation Applications
The Gemini Family
┌──────────────────────────────────────────────────────────────┐
│ Google Gemini Model Lineup (2025) │
├──────────────────┬─────────────┬────────────┬───────────────┤
│ Model │ Context │ Modalities│ Strength │
├──────────────────┼─────────────┼────────────┼───────────────┤
│ Gemini 2.5 Pro │ 1M tokens │ Everything │ Best overall │
│ Gemini 2.5 Flash│ 1M tokens │ Everything │ Speed+cost │
│ Gemini 2.0 Flash│ 1M tokens │ Everything │ Production │
│ Gemini 1.5 Pro │ 2M tokens │ Text+Vision│ Long context │
│ Gemma 3 (OSS) │ 128K │ Text+Vision│ Self-hosted │
└──────────────────┴─────────────┴────────────┴───────────────┘
Modalities: Text, Images, Audio, Video, Code, Documents
Setup and Authentication
python1pip install google-generativeai
python1import google.generativeai as genai 2import os 3 4genai.configure(api_key=os.environ["GOOGLE_API_KEY"]) 5 6model = genai.GenerativeModel( 7 model_name="gemini-2.5-flash", 8 system_instruction="You are a concise, expert technical assistant.", 9 generation_config=genai.GenerationConfig( 10 temperature=0.3, 11 max_output_tokens=2048, 12 response_mime_type="text/plain" 13 ) 14) 15 16response = model.generate_content("Explain transformer attention in 3 sentences.") 17print(response.text) 18print(f"Input tokens: {response.usage_metadata.prompt_token_count}") 19print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
Multimodal: Text + Images + Video
python1import google.generativeai as genai 2from pathlib import Path 3import PIL.Image 4 5model = genai.GenerativeModel("gemini-2.5-pro") 6 7# Image analysis 8def analyze_image(image_path: str, prompt: str) -> str: 9 img = PIL.Image.open(image_path) 10 response = model.generate_content([img, prompt]) 11 return response.text 12 13result = analyze_image( 14 "screenshot.png", 15 "Identify all UI components and suggest accessibility improvements." 16) 17 18# Video understanding (unique to Gemini) 19def analyze_video(video_path: str, prompt: str) -> str: 20 video_file = genai.upload_file(video_path, mime_type="video/mp4") 21 22 # Wait for processing 23 import time 24 while video_file.state.name == "PROCESSING": 25 time.sleep(2) 26 video_file = genai.get_file(video_file.name) 27 28 response = model.generate_content([video_file, prompt]) 29 return response.text 30 31# Analyze a recorded user testing session 32insights = analyze_video( 33 "user-session.mp4", 34 "Identify usability issues — where does the user hesitate, look confused, or make errors?" 35) 36 37# PDF / Document processing — Gemini natively understands PDFs 38def process_pdf(pdf_path: str, query: str) -> str: 39 pdf_file = genai.upload_file(pdf_path, mime_type="application/pdf") 40 response = model.generate_content([pdf_file, query]) 41 return response.text
Structured Output with Response Schema
python1import google.generativeai as genai 2import typing_extensions as typing 3 4class TechArticle(typing.TypedDict): 5 title: str 6 summary: str 7 key_concepts: list[str] 8 difficulty: typing.Literal["BEGINNER", "INTERMEDIATE", "ADVANCED"] 9 estimated_read_time_minutes: int 10 11model = genai.GenerativeModel( 12 model_name="gemini-2.5-flash", 13 generation_config=genai.GenerationConfig( 14 response_mime_type="application/json", 15 response_schema=TechArticle 16 ) 17) 18 19response = model.generate_content( 20 "Analyze this article about RAG systems and extract metadata." 21) 22 23import json 24article: TechArticle = json.loads(response.text) 25print(f"Difficulty: {article['difficulty']}, Read time: {article['estimated_read_time_minutes']} min")
Grounding with Google Search
Gemini can search the web in real-time and ground its answers in current information:
python1model = genai.GenerativeModel("gemini-2.5-pro") 2 3tool = genai.protos.Tool( 4 google_search=genai.protos.GoogleSearch() 5) 6 7response = model.generate_content( 8 "What are the latest AI model releases in the last week?", 9 tools=[tool] 10) 11 12print(response.text) 13# Cites sources, includes current information 14for chunk in response.candidates[0].grounding_metadata.grounding_chunks: 15 print(f"Source: {chunk.web.title} — {chunk.web.uri}")
Code Execution
Gemini 2.5 Pro can write AND run Python code, returning actual computed results:
python1model = genai.GenerativeModel( 2 model_name="gemini-2.5-pro", 3 tools=["code_execution"] 4) 5 6response = model.generate_content( 7 """Analyze this dataset and find statistical outliers: 8 [23, 45, 12, 67, 234, 34, 56, 11, 890, 45, 23, 67] 9 Plot a box plot and return the outlier values.""" 10) 11 12for part in response.candidates[0].content.parts: 13 if hasattr(part, "executable_code"): 14 print(f"Code executed:\n{part.executable_code.code}") 15 if hasattr(part, "code_execution_result"): 16 print(f"Output:\n{part.code_execution_result.output}") 17 if hasattr(part, "text"): 18 print(f"Analysis:\n{part.text}")
Multi-Turn Conversations
python1chat = model.start_chat(history=[]) 2 3# Maintains full conversation history automatically 4response1 = chat.send_message("I'm building a recommendation engine. What algorithm should I use?") 5print(response1.text) 6 7response2 = chat.send_message("My dataset has 50M users and 1M items. Does that change your recommendation?") 8print(response2.text) # References previous context 9 10response3 = chat.send_message("Show me a Python implementation of the approach you suggested.") 11print(response3.text) # Builds on both previous messages 12 13# Inspect full history 14for message in chat.history: 15 print(f"{message.role}: {message.parts[0].text[:100]}...")
Long-Context: 1M Token Window
Gemini's 1M-2M token context is transformative for large codebase analysis:
python1import os 2 3def analyze_entire_codebase(project_dir: str, question: str) -> str: 4 """Feed an entire codebase to Gemini for analysis.""" 5 code_content = [] 6 total_chars = 0 7 8 for root, dirs, files in os.walk(project_dir): 9 # Skip node_modules, .git, etc. 10 dirs[:] = [d for d in dirs if d not in {".git", "node_modules", "dist", ".next"}] 11 12 for file in files: 13 if file.endswith((".ts", ".tsx", ".py", ".go", ".rs")): 14 path = os.path.join(root, file) 15 content = Path(path).read_text(errors="ignore") 16 rel_path = os.path.relpath(path, project_dir) 17 code_content.append(f"### {rel_path}\n```\n{content}\n```") 18 total_chars += len(content) 19 20 full_context = "\n\n".join(code_content) 21 print(f"Feeding {len(code_content)} files ({total_chars:,} chars) to Gemini") 22 23 model = genai.GenerativeModel("gemini-1.5-pro") # 2M context 24 response = model.generate_content( 25 f"Codebase:\n{full_context}\n\nQuestion: {question}" 26 ) 27 return response.text 28 29# Ask architectural questions across the entire codebase 30analysis = analyze_entire_codebase( 31 "./my-app", 32 "Find all security vulnerabilities (XSS, SQLi, auth issues) in this codebase." 33)
Vertex AI Integration (Production)
python1import vertexai 2from vertexai.generative_models import GenerativeModel, Part 3 4vertexai.init(project="my-gcp-project", location="us-central1") 5 6model = GenerativeModel( 7 "gemini-2.5-pro", 8 system_instruction="You are a production AI assistant." 9) 10 11# Vertex AI provides: enterprise SLAs, VPC, audit logs, IAM, no data training 12response = model.generate_content( 13 ["Explain cloud architecture best practices for fintech applications."], 14 generation_config={ 15 "max_output_tokens": 2048, 16 "temperature": 0.1 17 } 18)
Gemini vs Claude vs GPT-4o at a Glance
┌───────────────┬──────────────┬──────────────┬──────────────┐
│ Feature │ Gemini 2.5 │ Claude 3.7 │ GPT-4o │
├───────────────┼──────────────┼──────────────┼──────────────┤
│ Context │ 1M tokens │ 200K tokens │ 128K tokens │
│ Video input │ ✅ Native │ ❌ │ ❌ │
│ Audio input │ ✅ Native │ ❌ │ ✅ (Whisper) │
│ Code execution│ ✅ Built-in │ ❌ │ ✅ (sandbox) │
│ Web search │ ✅ Built-in │ ❌ │ ✅ (plugin) │
│ Cost/MTok out │ ~$3.50 │ ~$15 │ ~$10 │
│ Safety focus │ Medium │ High (Const.)│ Medium │
└───────────────┴──────────────┴──────────────┴──────────────┘