Module 3 - Lesson 2b: Anthropic Media Prompts
Analyzing images with Anthropic's Claude Vision API.
Published: 2/18/2026
Lesson 2b: Media Prompts with Anthropic Claude
Learn how to analyze images using Anthropic's Claude Vision API. This lesson demonstrates Claude's structured approach to multimodal inputs with explicit metadata handling.
What It Does
Sends an image along with text instructions to Claude and receives detailed, thoughtful analysis. Claude excels at nuanced visual understanding and detailed reasoning.
Key Differences from OpenAI
- Structured Source Object: Images use a nested
sourceobject - Explicit Metadata: Separate
media_typefield required - Content Blocks Response: Response is array of content blocks
- max_tokens Required: Must specify maximum token limit
Code Example
The complete code is in src/anthropic/media-prompt.ts:
import Anthropic from "@anthropic-ai/sdk"; import dotenv from "dotenv"; import fs from "fs"; // Load environment variables dotenv.config(); // Create Anthropic client with typed configuration const anthropic = new Anthropic(); // Async function with proper return type async function mediaPrompt(): Promise<void> { try { console.log("Testing Anthropic connection..."); // Read image file and convert to base64 const base64Image = fs.readFileSync("./src/assets/lizard.jpg", "base64"); // Make API call - response is automatically typed! // Using a system prompt along with user prompt const response = await anthropic.messages.create({ model: "claude-haiku-4-5", max_tokens: 1000, system: "You are a naturalist. Provide detailed information about the flora and fauna in the image.", messages: [ { role: "user", content: [ { type: "text", text: "Analyze the image and describe the species present, their behaviors, and any notable ecological interactions. The photo was taken in the Galápagos Islands.", }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: base64Image, }, }, ], }, ], }); console.log("✅ Media Prompt Success!"); // show response usage console.log("Tokens used:"); console.dir(response.usage, { depth: null }); // Check if we got a response if (!response.content || response.content.length === 0) { throw new Error("No content in response"); } // Extract text // Type guard to ensure content is of expected type // Expecting response.content to be an array of objects with a 'type' and 'text' property const textBlocks = response.content.filter( (block) => block.type === "text", ); if (textBlocks.length === 0) { throw new Error("No text content in response"); } // TypeScript knows the structure of response console.log( "AI Response:", textBlocks.map((block) => block.text).join("\n"), ); } catch (error) { // Proper error handling with type guards if (error instanceof Anthropic.APIError) { console.log("❌ API Error:", error.status, error.message); } else if (error instanceof Error) { console.log("❌ Error:", error.message); } else { console.log("❌ Unknown error occurred"); } } } // Run the test mediaPrompt().catch((error) => { console.error("Error:", error); });
Run It
pnpm tsx src/anthropic/media-prompt.ts
Expected Output
Testing Anthropic connection...
✅ Media Prompt Success!
Tokens used:
{
input_tokens: 1289,
output_tokens: 245
}
AI Response: This image features a marine iguana (Amblyrhynchus cristatus),
a remarkable species that is endemic to the Galápagos Islands. As the world's
only marine lizard, this creature has evolved extraordinary adaptations that
enable it to forage for algae in the ocean...
Key Concepts
1. Structured Image Source
Anthropic uses a nested object structure for images:
{ type: "image", source: { type: "base64", // Source type media_type: "image/jpeg", // MIME type data: base64Image // Base64 string (no data URL prefix!) } }
Important: Don't include data:image/jpeg;base64, prefix - just the raw base64 string.
2. Media Types
Specify the correct MIME type:
| File Format | Media Type |
|---|---|
| JPEG | image/jpeg |
| PNG | image/png |
| WebP | image/webp |
| GIF | image/gif |
// Dynamic media type const getMediaType = (filepath: string) => { const ext = path.extname(filepath).toLowerCase(); const types = { ".jpg": "image/jpeg", ".jpeg": "image/jpeg", ".png": "image/png", ".webp": "image/webp", ".gif": "image/gif", }; return types[ext] || "image/jpeg"; };
3. Content Blocks Response
Claude returns an array of content blocks:
// Response structure { id: "msg_01ABC123", type: "message", role: "assistant", content: [ { type: "text", text: "Your detailed analysis here..." } ], usage: { input_tokens: 1289, output_tokens: 245 } }
Why blocks? Allows for future multimodal outputs (text + images in responses).
4. Extracting Text Response
Always filter for text blocks:
// Safe extraction with type checking const textBlocks = response.content.filter((block) => block.type === "text"); if (textBlocks.length === 0) { throw new Error("No text content in response"); } // Join all text blocks const fullText = textBlocks.map((block) => block.text).join("\n");
Side-by-Side Comparison
OpenAI Approach
{ type: "input_image", image_url: `data:image/jpeg;base64,${base64Image}`, detail: "auto" } // Response const text = response.output_text;
Anthropic Approach
{ type: "image", source: { type: "base64", media_type: "image/jpeg", data: base64Image // No data URL prefix! } } // Response const textBlocks = response.content.filter(b => b.type === "text"); const text = textBlocks.map(b => b.text).join("\n");
Image Requirements
Supported Formats
- ✅ JPEG (
.jpg,.jpeg) - ✅ PNG (
.png) - ✅ WebP (
.webp) - ✅ GIF (
.gif, non-animated)
Size Limits
- Max file size: 5MB (smaller than OpenAI!)
- Recommended: Under 3MB for faster processing
- Resolution: Automatically processed for optimal quality
Image Optimization
import sharp from "sharp"; // Compress for Anthropic's 5MB limit const optimizedImage = await sharp("large-image.jpg") .resize(2048, 2048, { fit: "inside" }) .jpeg({ quality: 85 }) .toBuffer(); // Check size if (optimizedImage.length > 5 * 1024 * 1024) { throw new Error("Image still too large after compression"); } const base64Image = optimizedImage.toString("base64");
Token Usage and Costs
Understanding Token Consumption
Vision models with Claude consume significant tokens:
console.log("Input tokens:", response.usage.input_tokens); // ~1289 console.log("Output tokens:", response.usage.output_tokens); // ~245
Token Breakdown:
- Text prompt: ~50 tokens
- Image processing: ~1200+ tokens
- System instruction: ~20 tokens
- Total input: ~1289 tokens
Cost Calculation (Claude Haiku 4.5)
Input: 1289 tokens × $0.25/1M = $0.00032
Output: 245 tokens × $1.25/1M = $0.00031
Total: ~$0.00063 per analysis
Very cost-effective! Claude Haiku is optimized for high-volume multimodal tasks.
System Instructions
System prompt is a top-level parameter:
const response = await anthropic.messages.create({ model: "claude-haiku-4-5", max_tokens: 1000, system: "You are a naturalist. Provide detailed information about the flora and fauna in the image.", messages: [...] });
Effective System Prompts for Vision
// Travel guide system: "You are an enthusiastic travel guide. Identify landmarks and share fascinating historical facts."; // Art critic system: "You are an art historian. Analyze composition, technique, and artistic style."; // Safety inspector system: "You are a safety inspector. Identify potential hazards and safety concerns."; // Naturalist system: "You are a naturalist. Provide scientific names, behaviors, and ecological context.";
Error Handling
Common Errors
try { const response = await anthropic.messages.create({...}); } catch (error) { if (error instanceof Anthropic.APIError) { switch (error.status) { case 400: // Check: valid base64, correct media_type, image size console.log("Invalid request:", error.message); break; case 401: console.log("Invalid API key"); break; case 413: console.log("Image too large (>5MB)"); break; case 429: console.log("Rate limit exceeded"); break; default: console.log("API Error:", error.message); } } }
Validation Checklist
// 1. Check file exists if (!fs.existsSync(imagePath)) { throw new Error("Image file not found"); } // 2. Check file size (5MB limit) const stats = fs.statSync(imagePath); if (stats.size > 5 * 1024 * 1024) { throw new Error("Image too large (max 5MB for Anthropic)"); } // 3. Verify base64 encoding const base64Image = fs.readFileSync(imagePath, "base64"); if (!base64Image || base64Image.length === 0) { throw new Error("Failed to encode image"); } // 4. Validate media type const validTypes = ["image/jpeg", "image/png", "image/webp", "image/gif"]; if (!validTypes.includes(mediaType)) { throw new Error(`Unsupported media type: ${mediaType}`); }
Practice Exercises
1. Multi-Image Analysis
messages: [ { role: "user", content: [ { type: "text", text: "Compare these two travel destinations:" }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: image1Base64, }, }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: image2Base64, }, }, ], }, ];
2. Detailed Structured Analysis
{ type: "text", text: `Analyze this image and provide: 1. Species identification (scientific name) 2. Physical characteristics 3. Habitat description 4. Behavioral observations 5. Conservation status Format as a numbered list.` }
3. Safety Analysis
system: "You are a safety expert. Identify all potential hazards.", messages: [ { role: "user", content: [ { type: "text", text: "List all safety concerns in this image, ranked by severity." }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: imageData } } ] } ]
Claude's Vision Strengths
1. Detailed Reasoning
Claude provides thorough, thoughtful analysis:
// Claude excels at explaining connections "The marine iguana's dark coloration serves multiple purposes: heat absorption after cold ocean dives, camouflage on volcanic rocks, and UV protection in the harsh equatorial sun..."
2. Safety and Accuracy
Claude is cautious and accurate:
// Claude will acknowledge uncertainty "While this appears to be a marine iguana, the image angle makes it difficult to confirm certain identifying features..."
3. Contextual Understanding
Claude considers provided context:
// You mentioned: "photo was taken in the Galápagos Islands" "Given the Galápagos location, this is definitely Amblyrhynchus cristatus, as marine iguanas are endemic only to these islands..."
Use Cases
1. Document Analysis
system: "You are a document analyst. Extract all text and data accurately.", content: [ { type: "text", text: "Extract all information from this invoice" }, { type: "image", source: { /* invoice image */ } } ]
2. Medical Image Review
system: "You are a medical AI assistant. Note: Always recommend professional consultation.", content: [ { type: "text", text: "Describe what you observe in this medical image" }, { type: "image", source: { /* scan image */ } } ]
3. Accessibility Alt Text
system: "You are an accessibility specialist. Create descriptive alt text.", content: [ { type: "text", text: "Generate detailed alt text for screen readers" }, { type: "image", source: { /* web image */ } } ]
Key Takeaways
- ✅ Anthropic uses nested
sourceobject for images - ✅ Separate
media_typefield is required - ✅ No data URL prefix - just raw base64 string
- ✅ Response is array of content blocks, filter for text
- ✅ 5MB size limit (smaller than OpenAI)
- ✅ Claude provides detailed, nuanced analysis
- ✅
max_tokensis required (as with all Claude calls)
Next Steps
You've mastered OpenAI and Anthropic's vision APIs. Now let's see Google Gemini's approach!
Next: Lesson 2c - Gemini Media Prompts →
Quick Reference
Minimal Working Example
import Anthropic from "@anthropic-ai/sdk"; import fs from "fs"; const anthropic = new Anthropic(); const base64Image = fs.readFileSync("image.jpg", "base64"); const response = await anthropic.messages.create({ model: "claude-haiku-4-5", max_tokens: 1000, messages: [ { role: "user", content: [ { type: "text", text: "What's in this image?" }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: base64Image, }, }, ], }, ], }); const text = response.content .filter((b) => b.type === "text") .map((b) => b.text) .join("\n"); console.log(text);
Common Pitfalls
- ❌ Including
data:image/jpeg;base64,prefix (don't!) - ❌ Forgetting
media_typefield - ❌ Images over 5MB (compress first!)
- ❌ Forgetting to filter content blocks
- ❌ Omitting required
max_tokens
Completed Lesson 2b! You can now analyze images with Anthropic Claude. 🎉