Module 3 - Lesson 2c: Gemini Media Prompts
Analyzing images with Google Gemini's Vision API.
Published: 2/18/2026
Lesson 2c: Media Prompts with Google Gemini
Learn how to analyze images using Google Gemini's Vision API. This lesson demonstrates Gemini's streamlined approach with inlineData and simple response access.
What It Does
Sends an image along with text instructions to Gemini and receives fast, accurate analysis. Gemini excels at speed and efficiency in multimodal tasks.
Key Differences from OpenAI & Anthropic
- inlineData Structure: Images use
inlineDatawithmimeType - Parts Array: Content is organized as
partsarray - Simple Response: Direct access via
response.text - System in Config: System instruction is in
configobject
Code Example
The complete code is in src/gemini/media-prompt.ts:
import { GoogleGenAI, ApiError } from "@google/genai"; import dotenv from "dotenv"; import fs from "fs"; // Load environment variables dotenv.config(); // Create Gemini client with typed configuration const gemini = new GoogleGenAI({}); // Async function with proper return type async function mediaPrompt(): Promise<void> { try { console.log("Testing Gemini connection..."); // Read image file and convert to base64 const base64Image = fs.readFileSync("./src/assets/lizard.jpg", "base64"); // Make API call - response is automatically typed! // Using a system prompt along with user prompt // note showing use of role const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: "Analyze the image and describe the species present, their behaviors, and any notable ecological interactions. The photo was taken in the Galápagos Islands.", }, { inlineData: { mimeType: "image/jpeg", data: base64Image, }, }, ], }, config: { systemInstruction: "You are a naturalist. Provide detailed information about the flora and fauna in the image.", }, }); console.log("✅ Media Prompt Success!"); // show response usage console.log("Tokens used:"); console.dir(response.usageMetadata, { depth: null }); // Check if we got a response if (!response.text || response.text.length === 0) { throw new Error("No content in response"); } // TypeScript knows the structure of response console.log("AI Response:", response.text); } catch (error) { // Proper error handling with type guards if (error instanceof ApiError) { console.log("❌ API Error:", error.status, error.message); } else if (error instanceof Error) { console.log("❌ Error:", error.message); } else { console.log("❌ Unknown error occurred"); } } } // Run the test mediaPrompt().catch((error) => { console.error("Error:", error); });
Run It
pnpm tsx src/gemini/media-prompt.ts
Expected Output
Testing Gemini connection...
✅ Media Prompt Success!
Tokens used:
{
promptTokenCount: 1156,
candidatesTokenCount: 198,
totalTokenCount: 1354
}
AI Response: This image shows a marine iguana, scientifically known as
Amblyrhynchus cristatus. These unique reptiles are endemic to the Galápagos
Islands and are the only lizards in the world that have adapted to a marine
lifestyle...
Key Concepts
1. InlineData Structure
Gemini uses inlineData for embedded images:
{ inlineData: { mimeType: "image/jpeg", // MIME type data: base64Image // Raw base64 (no prefix) } }
Simple and clean - just specify MIME type and data.
2. Parts Array
Content is organized as a parts array:
contents: { role: "user", parts: [ { text: "Your prompt here" }, { inlineData: { mimeType: "image/jpeg", data: base64Image } } ] }
Flexible: Mix text and images in any order.
3. MIME Types
Specify the correct MIME type:
| File Format | MIME Type |
|---|---|
| JPEG | image/jpeg |
| PNG | image/png |
| WebP | image/webp |
| GIF | image/gif |
// Helper function const getMimeType = (filepath: string): string => { const ext = path.extname(filepath).toLowerCase(); const mimeTypes: Record<string, string> = { ".jpg": "image/jpeg", ".jpeg": "image/jpeg", ".png": "image/png", ".webp": "image/webp", ".gif": "image/gif", }; return mimeTypes[ext] || "image/jpeg"; };
4. System Instructions in Config
System instruction is separate from content:
const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { /* user content */ }, config: { systemInstruction: "You are a naturalist...", }, });
Clean separation between user content and system configuration.
5. Simple Response Access
Direct text access with response.text:
// No filtering needed! console.log(response.text); // Token metadata console.log(response.usageMetadata);
Most straightforward of all three providers.
Three-Way Comparison
OpenAI
{ type: "input_image", image_url: `data:image/jpeg;base64,${base64}`, detail: "auto" } // Response: response.output_text
Anthropic
{ type: "image", source: { type: "base64", media_type: "image/jpeg", data: base64 } } // Response: response.content.filter(...).map(...).join()
Gemini (Simplest!)
{ inlineData: { mimeType: "image/jpeg", data: base64 } } // Response: response.text
Image Requirements
Supported Formats
- ✅ JPEG (
.jpg,.jpeg) - ✅ PNG (
.png) - ✅ WebP (
.webp) - ✅ GIF (
.gif)
Size Limits
- Max file size: ~20MB
- Recommended: Under 5MB for optimal speed
- Resolution: Automatically optimized
Optimization Example
import sharp from "sharp"; // Optimize for Gemini const optimizedImage = await sharp("large-image.jpg") .resize(2048, 2048, { fit: "inside" }) .jpeg({ quality: 85 }) .toBuffer(); const base64Image = optimizedImage.toString("base64");
Token Usage and Costs
Understanding Gemini's Token Metadata
console.log("Prompt tokens:", response.usageMetadata.promptTokenCount); console.log("Response tokens:", response.usageMetadata.candidatesTokenCount); console.log("Total tokens:", response.usageMetadata.totalTokenCount);
Token Breakdown:
- Text prompt: ~40 tokens
- Image processing: ~1100+ tokens
- System instruction: ~15 tokens
- Total: ~1350+ tokens
Cost Calculation (Gemini 3 Flash)
Input: 1156 tokens × $0.XX/1M = $0.00XX
Output: 198 tokens × $0.XX/1M = $0.00XX
Total: Very cost-effective!
Gemini Flash is optimized for speed and cost efficiency.
Configuration Options
Basic Configuration
config: { systemInstruction: "You are a helpful assistant", temperature: 0.7, maxOutputTokens: 1000, topP: 0.95, topK: 40 }
Temperature Control
// Precise, factual (travel guides, nature docs) config: { temperature: 0.2; } // Balanced (general analysis) config: { temperature: 0.7; } // Creative (artistic descriptions) config: { temperature: 1.2; }
Token Limits
// Short responses config: { maxOutputTokens: 500; } // Medium responses config: { maxOutputTokens: 1000; } // Long analyses config: { maxOutputTokens: 2000; }
Error Handling
Common Errors
try { const response = await gemini.models.generateContent({...}); } catch (error) { if (error instanceof ApiError) { switch (error.status) { case 400: console.log("Invalid request - check image format"); break; case 401: console.log("Invalid API key"); break; case 413: console.log("Image too large"); break; case 429: console.log("Rate limit exceeded"); break; case 503: console.log("Service temporarily unavailable"); break; default: console.log("API Error:", error.message); } } }
Validation Best Practices
// Validate before sending const validateImage = (imagePath: string) => { // 1. File exists? if (!fs.existsSync(imagePath)) { throw new Error("Image not found"); } // 2. Size check const stats = fs.statSync(imagePath); if (stats.size > 20 * 1024 * 1024) { throw new Error("Image too large (max 20MB)"); } // 3. Valid format? const ext = path.extname(imagePath).toLowerCase(); const validExts = [".jpg", ".jpeg", ".png", ".webp", ".gif"]; if (!validExts.includes(ext)) { throw new Error(`Unsupported format: ${ext}`); } return true; };
Practice Exercises
1. Multi-Image Comparison
parts: [ { text: "Compare these travel destinations:" }, { inlineData: { mimeType: "image/jpeg", data: image1Base64 } }, { text: "vs" }, { inlineData: { mimeType: "image/jpeg", data: image2Base64 } }, ];
2. Structured Output Request
parts: [ { text: `Analyze this wildlife image and respond in JSON format: { "species": "scientific name", "habitat": "description", "behaviors": ["list", "of", "behaviors"], "threats": ["conservation", "concerns"] }`, }, { inlineData: { mimeType: "image/jpeg", data: imageData } }, ];
3. Detailed Travel Analysis
config: { systemInstruction: "You are an enthusiastic travel guide" }, contents: { role: "user", parts: [ { text: "Identify this landmark and provide: 1) Historical significance, 2) Best visiting times, 3) Nearby attractions" }, { inlineData: { mimeType: "image/jpeg", data: landmarkImage } } ] }
4. Safety Assessment
config: { systemInstruction: "You are a safety inspector. Be thorough but not alarmist." }, contents: { role: "user", parts: [ { text: "Identify any safety concerns in this environment" }, { inlineData: { mimeType: "image/jpeg", data: sceneImage } } ] }
Gemini's Strengths
1. Speed
Gemini Flash is optimized for fast responses:
// Typical response times Text-only: ~1-2 seconds With image: ~2-4 seconds
2. Cost Efficiency
Great balance of quality and cost:
// High-volume applications for (const image of imagesBatch) { const analysis = await analyzeWithGemini(image); // Cost per image: ~$0.001-0.002 }
3. Simple API Design
Cleanest response access:
// OpenAI const text = response.output_text; // Anthropic const text = response.content .filter((b) => b.type === "text") .map((b) => b.text) .join("\n"); // Gemini (simplest!) const text = response.text;
Use Cases
1. Travel App - Quick Landmark ID
const identifyLandmark = async (imageBase64: string) => { const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: "Identify this landmark in 2-3 sentences" }, { inlineData: { mimeType: "image/jpeg", data: imageBase64 } }, ], }, config: { maxOutputTokens: 200 }, }); return response.text; };
2. Wildlife Spotter App
const identifyAnimal = async (photoBase64: string, location: string) => { const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: `Identify the animal species. Location: ${location}. Include conservation status.`, }, { inlineData: { mimeType: "image/jpeg", data: photoBase64 } }, ], }, config: { systemInstruction: "You are a wildlife expert. Be concise but informative.", }, }); return response.text; };
3. Accessibility Alt Text Generator
const generateAltText = async (imageBase64: string) => { const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: "Create a concise, descriptive alt text for screen readers (max 125 characters)", }, { inlineData: { mimeType: "image/jpeg", data: imageBase64 } }, ], }, config: { maxOutputTokens: 100 }, }); return response.text.slice(0, 125); };
Provider Selection Guide
Choose Gemini When:
- ✅ Speed matters - Real-time applications
- ✅ High volume - Processing many images
- ✅ Cost-conscious - Budget-friendly option
- ✅ Simple integration - Easy API design
- ✅ Google ecosystem - Already using GCP/Firebase
Choose OpenAI When:
- ✅ Detail control needed (
detailparameter) - ✅ Existing OpenAI integration
- ✅ GPT-4 level quality required
Choose Anthropic When:
- ✅ Detailed reasoning needed
- ✅ Safety-critical applications
- ✅ Long, nuanced analysis required
Key Takeaways
- ✅ Gemini uses
inlineDatawithmimeType - ✅ Content organized as
partsarray - ✅ System instruction in separate
configobject - ✅ Simplest response access with
.text - ✅ Excellent speed/cost balance
- ✅ Token metadata via
usageMetadata - ✅ Clean, straightforward API design
Module Complete!
Congratulations! You've now mastered image analysis across all three major AI providers:
- OpenAI - Flexible, detail control
- Anthropic - Detailed, thoughtful analysis
- Gemini - Fast, cost-effective
What's Next?
You can now:
- ✅ Choose the right provider for each task
- ✅ Implement multi-provider fallback systems
- ✅ Build production multimodal applications
- ✅ Optimize for cost vs quality tradeoffs
Return to AI SDK Essentials Index
Quick Reference
Minimal Working Example
import { GoogleGenAI } from "@google/genai"; import fs from "fs"; const gemini = new GoogleGenAI({}); const base64Image = fs.readFileSync("image.jpg", "base64"); const response = await gemini.models.generateContent({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: "What's in this image?" }, { inlineData: { mimeType: "image/jpeg", data: base64Image } }, ], }, }); console.log(response.text);
Common Pitfalls
- ❌ Wrong
mimeTypefor file format - ❌ Forgetting raw base64 (no data URL prefix)
- ❌ Not handling
ApiErrorproperly - ❌ Accessing
response.outputinstead ofresponse.text
Completed Module 3! You're now a multimodal AI expert! 🎉🖼️