Back to all articles
Featured image for article: Google Gemini: Multimodal AI for Next-Generation Applications
AI
22 min read2,901 views

Google Gemini: Multimodal AI for Next-Generation Applications

A complete technical guide to Google Gemini — Gemini 2.5 Pro, Flash models, multimodal capabilities, the Google AI SDK, Vertex AI integration, code generation, and building production apps with Gemini.

#Gemini#Google AI#Multimodal#LLM#Vertex AI#AI

Google Gemini: Multimodal AI for Next-Generation Applications

The Gemini Family

┌──────────────────────────────────────────────────────────────┐
│              Google Gemini Model Lineup (2025)               │
├──────────────────┬─────────────┬────────────┬───────────────┤
│  Model           │  Context    │  Modalities│  Strength     │
├──────────────────┼─────────────┼────────────┼───────────────┤
│  Gemini 2.5 Pro  │  1M tokens  │ Everything │  Best overall │
│  Gemini 2.5 Flash│  1M tokens  │ Everything │  Speed+cost   │
│  Gemini 2.0 Flash│  1M tokens  │ Everything │  Production   │
│  Gemini 1.5 Pro  │  2M tokens  │ Text+Vision│  Long context │
│  Gemma 3 (OSS)   │  128K       │ Text+Vision│  Self-hosted  │
└──────────────────┴─────────────┴────────────┴───────────────┘
  Modalities: Text, Images, Audio, Video, Code, Documents

Setup and Authentication

python
1pip install google-generativeai
python
1import google.generativeai as genai 2import os 3 4genai.configure(api_key=os.environ["GOOGLE_API_KEY"]) 5 6model = genai.GenerativeModel( 7 model_name="gemini-2.5-flash", 8 system_instruction="You are a concise, expert technical assistant.", 9 generation_config=genai.GenerationConfig( 10 temperature=0.3, 11 max_output_tokens=2048, 12 response_mime_type="text/plain" 13 ) 14) 15 16response = model.generate_content("Explain transformer attention in 3 sentences.") 17print(response.text) 18print(f"Input tokens: {response.usage_metadata.prompt_token_count}") 19print(f"Output tokens: {response.usage_metadata.candidates_token_count}")

Multimodal: Text + Images + Video

python
1import google.generativeai as genai 2from pathlib import Path 3import PIL.Image 4 5model = genai.GenerativeModel("gemini-2.5-pro") 6 7# Image analysis 8def analyze_image(image_path: str, prompt: str) -> str: 9 img = PIL.Image.open(image_path) 10 response = model.generate_content([img, prompt]) 11 return response.text 12 13result = analyze_image( 14 "screenshot.png", 15 "Identify all UI components and suggest accessibility improvements." 16) 17 18# Video understanding (unique to Gemini) 19def analyze_video(video_path: str, prompt: str) -> str: 20 video_file = genai.upload_file(video_path, mime_type="video/mp4") 21 22 # Wait for processing 23 import time 24 while video_file.state.name == "PROCESSING": 25 time.sleep(2) 26 video_file = genai.get_file(video_file.name) 27 28 response = model.generate_content([video_file, prompt]) 29 return response.text 30 31# Analyze a recorded user testing session 32insights = analyze_video( 33 "user-session.mp4", 34 "Identify usability issues — where does the user hesitate, look confused, or make errors?" 35) 36 37# PDF / Document processing — Gemini natively understands PDFs 38def process_pdf(pdf_path: str, query: str) -> str: 39 pdf_file = genai.upload_file(pdf_path, mime_type="application/pdf") 40 response = model.generate_content([pdf_file, query]) 41 return response.text

Structured Output with Response Schema

python
1import google.generativeai as genai 2import typing_extensions as typing 3 4class TechArticle(typing.TypedDict): 5 title: str 6 summary: str 7 key_concepts: list[str] 8 difficulty: typing.Literal["BEGINNER", "INTERMEDIATE", "ADVANCED"] 9 estimated_read_time_minutes: int 10 11model = genai.GenerativeModel( 12 model_name="gemini-2.5-flash", 13 generation_config=genai.GenerationConfig( 14 response_mime_type="application/json", 15 response_schema=TechArticle 16 ) 17) 18 19response = model.generate_content( 20 "Analyze this article about RAG systems and extract metadata." 21) 22 23import json 24article: TechArticle = json.loads(response.text) 25print(f"Difficulty: {article['difficulty']}, Read time: {article['estimated_read_time_minutes']} min")

Grounding with Google Search

Gemini can search the web in real-time and ground its answers in current information:

python
1model = genai.GenerativeModel("gemini-2.5-pro") 2 3tool = genai.protos.Tool( 4 google_search=genai.protos.GoogleSearch() 5) 6 7response = model.generate_content( 8 "What are the latest AI model releases in the last week?", 9 tools=[tool] 10) 11 12print(response.text) 13# Cites sources, includes current information 14for chunk in response.candidates[0].grounding_metadata.grounding_chunks: 15 print(f"Source: {chunk.web.title}{chunk.web.uri}")

Code Execution

Gemini 2.5 Pro can write AND run Python code, returning actual computed results:

python
1model = genai.GenerativeModel( 2 model_name="gemini-2.5-pro", 3 tools=["code_execution"] 4) 5 6response = model.generate_content( 7 """Analyze this dataset and find statistical outliers: 8 [23, 45, 12, 67, 234, 34, 56, 11, 890, 45, 23, 67] 9 Plot a box plot and return the outlier values.""" 10) 11 12for part in response.candidates[0].content.parts: 13 if hasattr(part, "executable_code"): 14 print(f"Code executed:\n{part.executable_code.code}") 15 if hasattr(part, "code_execution_result"): 16 print(f"Output:\n{part.code_execution_result.output}") 17 if hasattr(part, "text"): 18 print(f"Analysis:\n{part.text}")

Multi-Turn Conversations

python
1chat = model.start_chat(history=[]) 2 3# Maintains full conversation history automatically 4response1 = chat.send_message("I'm building a recommendation engine. What algorithm should I use?") 5print(response1.text) 6 7response2 = chat.send_message("My dataset has 50M users and 1M items. Does that change your recommendation?") 8print(response2.text) # References previous context 9 10response3 = chat.send_message("Show me a Python implementation of the approach you suggested.") 11print(response3.text) # Builds on both previous messages 12 13# Inspect full history 14for message in chat.history: 15 print(f"{message.role}: {message.parts[0].text[:100]}...")

Long-Context: 1M Token Window

Gemini's 1M-2M token context is transformative for large codebase analysis:

python
1import os 2 3def analyze_entire_codebase(project_dir: str, question: str) -> str: 4 """Feed an entire codebase to Gemini for analysis.""" 5 code_content = [] 6 total_chars = 0 7 8 for root, dirs, files in os.walk(project_dir): 9 # Skip node_modules, .git, etc. 10 dirs[:] = [d for d in dirs if d not in {".git", "node_modules", "dist", ".next"}] 11 12 for file in files: 13 if file.endswith((".ts", ".tsx", ".py", ".go", ".rs")): 14 path = os.path.join(root, file) 15 content = Path(path).read_text(errors="ignore") 16 rel_path = os.path.relpath(path, project_dir) 17 code_content.append(f"### {rel_path}\n```\n{content}\n```") 18 total_chars += len(content) 19 20 full_context = "\n\n".join(code_content) 21 print(f"Feeding {len(code_content)} files ({total_chars:,} chars) to Gemini") 22 23 model = genai.GenerativeModel("gemini-1.5-pro") # 2M context 24 response = model.generate_content( 25 f"Codebase:\n{full_context}\n\nQuestion: {question}" 26 ) 27 return response.text 28 29# Ask architectural questions across the entire codebase 30analysis = analyze_entire_codebase( 31 "./my-app", 32 "Find all security vulnerabilities (XSS, SQLi, auth issues) in this codebase." 33)

Vertex AI Integration (Production)

python
1import vertexai 2from vertexai.generative_models import GenerativeModel, Part 3 4vertexai.init(project="my-gcp-project", location="us-central1") 5 6model = GenerativeModel( 7 "gemini-2.5-pro", 8 system_instruction="You are a production AI assistant." 9) 10 11# Vertex AI provides: enterprise SLAs, VPC, audit logs, IAM, no data training 12response = model.generate_content( 13 ["Explain cloud architecture best practices for fintech applications."], 14 generation_config={ 15 "max_output_tokens": 2048, 16 "temperature": 0.1 17 } 18)

Gemini vs Claude vs GPT-4o at a Glance

┌───────────────┬──────────────┬──────────────┬──────────────┐
│ Feature       │ Gemini 2.5   │ Claude 3.7   │ GPT-4o       │
├───────────────┼──────────────┼──────────────┼──────────────┤
│ Context       │ 1M tokens    │ 200K tokens  │ 128K tokens  │
│ Video input   │ ✅ Native    │ ❌           │ ❌           │
│ Audio input   │ ✅ Native    │ ❌           │ ✅ (Whisper) │
│ Code execution│ ✅ Built-in  │ ❌           │ ✅ (sandbox) │
│ Web search    │ ✅ Built-in  │ ❌           │ ✅ (plugin)  │
│ Cost/MTok out │ ~$3.50       │ ~$15         │ ~$10         │
│ Safety focus  │ Medium       │ High (Const.)│ Medium       │
└───────────────┴──────────────┴──────────────┴──────────────┘
Profile picture of Sumit Kumar Pandey

Sumit Kumar Pandey

Full-Stack Developer

Full-Stack Developer with 5+ years of experience building scalable web applications. Passionate about clean code, performance optimization, and modern web technologies.

About the Author

Author information for Sumit Kumar Pandey

Share this article

Found this helpful? Share with your network!

0 shares

Discussion (0)

Share your thoughts and join the conversation

Leave a comment

Be respectful and stay on topic

Write your comment in the text area above. Comments should be respectful and relevant to the article.

AI Chat Assistant

Interactive AI assistant for Sumit Kumar Pandey's portfolio website. Ask questions about technical skills, work experience, projects, availability, and contact information. Powered by Next.js API.