Lesson 2 Media Prompts | Module 3

Comprehensive guide to working with images across OpenAI, Anthropic, and Google Gemini.

Media Prompts: Multi-Provider Image Analysis

This lesson demonstrates how to work with images across all three AI providers. You'll see how each provider structures image input differently, while achieving the same goal: analyzing visual content with AI.

Lesson Structure

Each sub-lesson covers a specific provider's approach to image analysis. Follow them in order to understand the differences and similarities in multimodal AI development.

📸 Media Prompt Overview

Common Elements Across All Providers

Despite different APIs, all three providers share core concepts:

Image Encoding - Convert images to base64 format
Multimodal Messages - Combine text and images in prompts
System Instructions - Guide the AI's analysis approach
Structured Responses - Extract text from AI responses
Token Usage - Track API costs and limits

Our Example Task

All three implementations analyze the same image with the same prompt:

System Instruction:

"You are a naturalist. Provide detailed information about the flora and fauna in the image."

User Prompt:

"Analyze the image and describe the species present, their behaviors, and any notable ecological interactions. The photo was taken in the Galápagos Islands."

Image: A lizard photo from src/assets/lizard.jpg

This consistency lets you directly compare implementation approaches.

📚 Provider-Specific Implementations

Lesson 2a: OpenAI Media Prompts

Analyzing images with OpenAI's Vision API.

Use openai.responses.create() with multimodal input
Structure input_text and input_image in content array
Handle output_text response format
Understand OpenAI's base64 data URL approach
Key Feature: Direct image_url string format
Model Used: gpt-4o-mini
Code: src/openai/media-prompt.ts

API Structure Highlights:

{
  type: "input_image",
  image_url: `data:image/jpeg;base64,${base64Image}`
}

Lesson 2b: Anthropic Media Prompts

Analyzing images with Anthropic's Claude Vision API.

Use anthropic.messages.create() with image content blocks
Structure nested source object for images
Specify explicit media_type and data properties
Extract text from content array response
Key Feature: Structured source object with metadata
Model Used: claude-haiku-4-5
Code: src/anthropic/media-prompt.ts

API Structure Highlights:

{
  type: "image",
  source: {
    type: "base64",
    media_type: "image/jpeg",
    data: base64Image
  }
}

Lesson 2c: Gemini Media Prompts

Analyzing images with Google Gemini's Vision API.

Use gemini.models.generateContent() with parts array
Structure inlineData object for images
Use systemInstruction in config for role guidance
Access simple .text response property
Key Feature: inlineData with mimeType in parts
Model Used: gemini-3-flash-preview
Code: src/gemini/media-prompt.ts

API Structure Highlights:

{
  inlineData: {
    mimeType: "image/jpeg",
    data: base64Image
  }
}

🔄 Side-by-Side Comparison

Request Structure

Provider	Method	Image Key	Metadata
OpenAI	`responses.create()`	`image_url` (string)	Inline in URL
Anthropic	`messages.create()`	`source.data` (object)	Separate fields
Gemini	`generateContent()`	`inlineData.data` (object)	`mimeType`

Response Structure

Provider	Access Pattern	Format
OpenAI	`response.output_text`	String
Anthropic	`response.content[0].text`	Array of blocks
Gemini	`response.text`	String

System Instructions

Provider	Location	Parameter Name
OpenAI	In `input` array	`role: "system"`
Anthropic	Top-level	`system`
Gemini	In `config`	`systemInstruction`

🎯 When to Use Each Provider

OpenAI Vision (GPT-4o-mini)

Best For:

Quick image analysis
Integration with existing OpenAI workflows
When you need detail parameter control (low/high/auto)

Strengths:

Simple, direct data URL format
Fast response times
Good balance of quality and cost

Anthropic Vision (Claude Haiku 4.5)

Best For:

Detailed image analysis
Safety-critical applications
Long-form visual descriptions

Strengths:

Excellent reasoning about images
Strong safety and content filtering
Detailed, nuanced responses

Gemini Vision (Gemini 3 Flash)

Best For:

High-volume image processing
Cost-conscious applications
Google ecosystem integration

Strengths:

Very fast processing
Cost-effective
Simple response structure

💡 Best Practices for Image Prompts

1. Clear Instructions

// ✅ Good - Specific, actionable
"Identify the animal species and describe its distinctive features.";

// ❌ Bad - Vague
"Tell me about this.";

2. Provide Context

// ✅ Good - Context helps
"Analyze this medical scan for abnormalities. This is a chest X-ray.";

// ❌ Bad - No context
"What do you see?";

3. Structure Your Request

// ✅ Good - Organized
"Please: 1) Identify all objects, 2) Describe spatial relationships, 3) Note any text";

// ❌ Bad - Unstructured
"Tell me everything about this image";

4. Optimize Image Size

// ✅ Good - Reasonable size
const image = await sharp(buffer)
  .resize(1024, 1024, { fit: "inside" })
  .jpeg({ quality: 80 })
  .toBuffer();

// ❌ Bad - Unnecessarily large
// Sending 10MB images when 1MB would work

🔧 Common Patterns

Multi-Provider Fallback

async function analyzeImage(imageBase64: string, prompt: string) {
  try {
    // Try primary provider
    return await analyzeWithOpenAI(imageBase64, prompt);
  } catch (error) {
    console.log("OpenAI failed, trying Anthropic...");
    try {
      return await analyzeWithAnthropic(imageBase64, prompt);
    } catch (error) {
      console.log("Anthropic failed, trying Gemini...");
      return await analyzeWithGemini(imageBase64, prompt);
    }
  }
}

Provider Selection Based on Task

function selectProvider(taskType: string) {
  switch (taskType) {
    case "detailed-analysis":
      return "anthropic"; // Claude's reasoning
    case "quick-scan":
      return "openai"; // GPT-4o speed
    case "high-volume":
      return "gemini"; // Cost efficiency
    default:
      return "openai";
  }
}

📋 Code Checklist

Before running the media prompt examples:

✅ API Keys Set: Check .env for OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY
✅ Image Available: Confirm src/assets/lizard.jpg exists
✅ Dependencies Installed: Run pnpm install in project root
✅ TypeScript Compiled: Ensure no type errors with tsc --noEmit

🚀 Running the Examples

# OpenAI example
npx tsx src/openai/media-prompt.ts

# Anthropic example
npx tsx src/anthropic/media-prompt.ts

# Gemini example
npx tsx src/gemini/media-prompt.ts

Each script will:

Load the lizard image
Convert to base64
Send multimodal prompt
Display AI analysis
Show token usage

📊 Expected Output

All three providers should return similar analysis:

✅ Media Prompt Success!
AI Response: This image shows a marine iguana (Amblyrhynchus cristatus),
an endemic species to the Galápagos Islands. The iguana displays the
characteristic dark coloration and robust body adapted for marine life...

Tokens used: { input: 1245, output: 189, total: 1434 }

🎓 Key Takeaways

Different Structures, Same Goal - Each provider has unique API design
Base64 Encoding - Universal format for image transmission
Cost vs Quality - Balance response quality with token costs
Error Handling - Implement fallbacks for production reliability
Provider Strengths - Match provider to task requirements

Navigation

Previous: Lesson 1: Module 3 Overview
Next: Lesson 2a: OpenAI Media Prompts
Module Index: AI SDK Essentials

Ready to dive into the code? Start with OpenAI in Lesson 2a!

Module 3 - Lesson 2: Media Prompts Across Providers

Media Prompts: Multi-Provider Image Analysis

Lesson Structure

📸 Media Prompt Overview

Common Elements Across All Providers

Our Example Task

📚 Provider-Specific Implementations

Lesson 2a: OpenAI Media Prompts

Lesson 2b: Anthropic Media Prompts

Lesson 2c: Gemini Media Prompts

🔄 Side-by-Side Comparison

Request Structure

Response Structure

System Instructions

🎯 When to Use Each Provider

OpenAI Vision (GPT-4o-mini)

Anthropic Vision (Claude Haiku 4.5)

Gemini Vision (Gemini 3 Flash)

💡 Best Practices for Image Prompts

1. Clear Instructions

2. Provide Context

3. Structure Your Request

4. Optimize Image Size

🔧 Common Patterns

Multi-Provider Fallback

Provider Selection Based on Task

📋 Code Checklist

🚀 Running the Examples

📊 Expected Output

🎓 Key Takeaways

Navigation