Lesson 2b Anthropic Media | Module 3

Lesson 2b: Media Prompts with Anthropic Claude

Learn how to analyze images using Anthropic's Claude Vision API. This lesson demonstrates Claude's structured approach to multimodal inputs with explicit metadata handling.

What It Does

Sends an image along with text instructions to Claude and receives detailed, thoughtful analysis. Claude excels at nuanced visual understanding and detailed reasoning.

Key Differences from OpenAI

Structured Source Object: Images use a nested source object
Explicit Metadata: Separate media_type field required
Content Blocks Response: Response is array of content blocks
max_tokens Required: Must specify maximum token limit

Code Example

The complete code is in src/anthropic/media-prompt.ts:

import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import fs from "fs";

// Load environment variables
dotenv.config();

// Create Anthropic client with typed configuration
const anthropic = new Anthropic();

// Async function with proper return type
async function mediaPrompt(): Promise<void> {
  try {
    console.log("Testing Anthropic connection...");
    // Read image file and convert to base64
    const base64Image = fs.readFileSync("./src/assets/lizard.jpg", "base64");

    // Make API call - response is automatically typed!
    // Using a system prompt along with user prompt
    const response = await anthropic.messages.create({
      model: "claude-haiku-4-5",
      max_tokens: 1000,
      system:
        "You are a naturalist. Provide detailed information about the flora and fauna in the image.",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "text",
              text: "Analyze the image and describe the species present, their behaviors, and any notable ecological interactions. The photo was taken in the Galápagos Islands.",
            },
            {
              type: "image",
              source: {
                type: "base64",
                media_type: "image/jpeg",
                data: base64Image,
              },
            },
          ],
        },
      ],
    });

    console.log("✅  Media Prompt Success!");
    // show response usage
    console.log("Tokens used:");
    console.dir(response.usage, { depth: null });

    // Check if we got a response
    if (!response.content || response.content.length === 0) {
      throw new Error("No content in response");
    }

    // Extract text
    // Type guard to ensure content is of expected type
    // Expecting response.content to be an array of objects with a 'type' and 'text' property
    const textBlocks = response.content.filter(
      (block) => block.type === "text",
    );

    if (textBlocks.length === 0) {
      throw new Error("No text content in response");
    }

    // TypeScript knows the structure of response
    console.log(
      "AI Response:",
      textBlocks.map((block) => block.text).join("\n"),
    );
  } catch (error) {
    // Proper error handling with type guards
    if (error instanceof Anthropic.APIError) {
      console.log("❌ API Error:", error.status, error.message);
    } else if (error instanceof Error) {
      console.log("❌ Error:", error.message);
    } else {
      console.log("❌ Unknown error occurred");
    }
  }
}

// Run the test
mediaPrompt().catch((error) => {
  console.error("Error:", error);
});

Run It

pnpm tsx src/anthropic/media-prompt.ts

Expected Output

Testing Anthropic connection...
✅  Media Prompt Success!
Tokens used:
{
  input_tokens: 1289,
  output_tokens: 245
}
AI Response: This image features a marine iguana (Amblyrhynchus cristatus),
a remarkable species that is endemic to the Galápagos Islands. As the world's
only marine lizard, this creature has evolved extraordinary adaptations that
enable it to forage for algae in the ocean...

Key Concepts

1. Structured Image Source

Anthropic uses a nested object structure for images:

{
  type: "image",
  source: {
    type: "base64",           // Source type
    media_type: "image/jpeg", // MIME type
    data: base64Image         // Base64 string (no data URL prefix!)
  }
}

Important: Don't include data:image/jpeg;base64, prefix - just the raw base64 string.

2. Media Types

Specify the correct MIME type:

File Format	Media Type
JPEG	`image/jpeg`
PNG	`image/png`
WebP	`image/webp`
GIF	`image/gif`

// Dynamic media type
const getMediaType = (filepath: string) => {
  const ext = path.extname(filepath).toLowerCase();
  const types = {
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".png": "image/png",
    ".webp": "image/webp",
    ".gif": "image/gif",
  };
  return types[ext] || "image/jpeg";
};

3. Content Blocks Response

Claude returns an array of content blocks:

// Response structure
{
  id: "msg_01ABC123",
  type: "message",
  role: "assistant",
  content: [
    {
      type: "text",
      text: "Your detailed analysis here..."
    }
  ],
  usage: {
    input_tokens: 1289,
    output_tokens: 245
  }
}

Why blocks? Allows for future multimodal outputs (text + images in responses).

4. Extracting Text Response

Always filter for text blocks:

// Safe extraction with type checking
const textBlocks = response.content.filter((block) => block.type === "text");

if (textBlocks.length === 0) {
  throw new Error("No text content in response");
}

// Join all text blocks
const fullText = textBlocks.map((block) => block.text).join("\n");

Side-by-Side Comparison

OpenAI Approach

{
  type: "input_image",
  image_url: `data:image/jpeg;base64,${base64Image}`,
  detail: "auto"
}

// Response
const text = response.output_text;

Anthropic Approach

{
  type: "image",
  source: {
    type: "base64",
    media_type: "image/jpeg",
    data: base64Image  // No data URL prefix!
  }
}

// Response
const textBlocks = response.content.filter(b => b.type === "text");
const text = textBlocks.map(b => b.text).join("\n");

Image Requirements

Supported Formats

✅ JPEG (.jpg, .jpeg)
✅ PNG (.png)
✅ WebP (.webp)
✅ GIF (.gif, non-animated)

Size Limits

Max file size: 5MB (smaller than OpenAI!)
Recommended: Under 3MB for faster processing
Resolution: Automatically processed for optimal quality

Image Optimization

import sharp from "sharp";

// Compress for Anthropic's 5MB limit
const optimizedImage = await sharp("large-image.jpg")
  .resize(2048, 2048, { fit: "inside" })
  .jpeg({ quality: 85 })
  .toBuffer();

// Check size
if (optimizedImage.length > 5 * 1024 * 1024) {
  throw new Error("Image still too large after compression");
}

const base64Image = optimizedImage.toString("base64");

Token Usage and Costs

Understanding Token Consumption

Vision models with Claude consume significant tokens:

console.log("Input tokens:", response.usage.input_tokens); // ~1289
console.log("Output tokens:", response.usage.output_tokens); // ~245

Token Breakdown:

Text prompt: ~50 tokens
Image processing: ~1200+ tokens
System instruction: ~20 tokens
Total input: ~1289 tokens

Cost Calculation (Claude Haiku 4.5)

Input: 1289 tokens × $0.25/1M = $0.00032
Output: 245 tokens × $1.25/1M = $0.00031
Total: ~$0.00063 per analysis

Very cost-effective! Claude Haiku is optimized for high-volume multimodal tasks.

System Instructions

System prompt is a top-level parameter:

const response = await anthropic.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 1000,
  system: "You are a naturalist. Provide detailed information about the flora and fauna in the image.",
  messages: [...]
});

Effective System Prompts for Vision

// Travel guide
system: "You are an enthusiastic travel guide. Identify landmarks and share fascinating historical facts.";

// Art critic
system: "You are an art historian. Analyze composition, technique, and artistic style.";

// Safety inspector
system: "You are a safety inspector. Identify potential hazards and safety concerns.";

// Naturalist
system: "You are a naturalist. Provide scientific names, behaviors, and ecological context.";

Error Handling

Common Errors

try {
  const response = await anthropic.messages.create({...});
} catch (error) {
  if (error instanceof Anthropic.APIError) {
    switch (error.status) {
      case 400:
        // Check: valid base64, correct media_type, image size
        console.log("Invalid request:", error.message);
        break;
      case 401:
        console.log("Invalid API key");
        break;
      case 413:
        console.log("Image too large (>5MB)");
        break;
      case 429:
        console.log("Rate limit exceeded");
        break;
      default:
        console.log("API Error:", error.message);
    }
  }
}

Validation Checklist

// 1. Check file exists
if (!fs.existsSync(imagePath)) {
  throw new Error("Image file not found");
}

// 2. Check file size (5MB limit)
const stats = fs.statSync(imagePath);
if (stats.size > 5 * 1024 * 1024) {
  throw new Error("Image too large (max 5MB for Anthropic)");
}

// 3. Verify base64 encoding
const base64Image = fs.readFileSync(imagePath, "base64");
if (!base64Image || base64Image.length === 0) {
  throw new Error("Failed to encode image");
}

// 4. Validate media type
const validTypes = ["image/jpeg", "image/png", "image/webp", "image/gif"];
if (!validTypes.includes(mediaType)) {
  throw new Error(`Unsupported media type: ${mediaType}`);
}

Practice Exercises

1. Multi-Image Analysis

messages: [
  {
    role: "user",
    content: [
      { type: "text", text: "Compare these two travel destinations:" },
      {
        type: "image",
        source: {
          type: "base64",
          media_type: "image/jpeg",
          data: image1Base64,
        },
      },
      {
        type: "image",
        source: {
          type: "base64",
          media_type: "image/jpeg",
          data: image2Base64,
        },
      },
    ],
  },
];

2. Detailed Structured Analysis

{
  type: "text",
  text: `Analyze this image and provide:
1. Species identification (scientific name)
2. Physical characteristics
3. Habitat description
4. Behavioral observations
5. Conservation status
Format as a numbered list.`
}

3. Safety Analysis

system: "You are a safety expert. Identify all potential hazards.",
messages: [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: "List all safety concerns in this image, ranked by severity."
      },
      {
        type: "image",
        source: { type: "base64", media_type: "image/jpeg", data: imageData }
      }
    ]
  }
]

Claude's Vision Strengths

1. Detailed Reasoning

Claude provides thorough, thoughtful analysis:

// Claude excels at explaining connections
"The marine iguana's dark coloration serves multiple purposes:
heat absorption after cold ocean dives, camouflage on volcanic
rocks, and UV protection in the harsh equatorial sun..."

2. Safety and Accuracy

Claude is cautious and accurate:

// Claude will acknowledge uncertainty
"While this appears to be a marine iguana, the image angle
makes it difficult to confirm certain identifying features..."

3. Contextual Understanding

Claude considers provided context:

// You mentioned: "photo was taken in the Galápagos Islands"
"Given the Galápagos location, this is definitely Amblyrhynchus
cristatus, as marine iguanas are endemic only to these islands..."

Use Cases

1. Document Analysis

system: "You are a document analyst. Extract all text and data accurately.",
content: [
  { type: "text", text: "Extract all information from this invoice" },
  { type: "image", source: { /* invoice image */ } }
]

2. Medical Image Review

system: "You are a medical AI assistant. Note: Always recommend professional consultation.",
content: [
  { type: "text", text: "Describe what you observe in this medical image" },
  { type: "image", source: { /* scan image */ } }
]

3. Accessibility Alt Text

system: "You are an accessibility specialist. Create descriptive alt text.",
content: [
  { type: "text", text: "Generate detailed alt text for screen readers" },
  { type: "image", source: { /* web image */ } }
]

Key Takeaways

✅ Anthropic uses nested source object for images
✅ Separate media_type field is required
✅ No data URL prefix - just raw base64 string
✅ Response is array of content blocks, filter for text
✅ 5MB size limit (smaller than OpenAI)
✅ Claude provides detailed, nuanced analysis
✅ max_tokens is required (as with all Claude calls)

Next Steps

You've mastered OpenAI and Anthropic's vision APIs. Now let's see Google Gemini's approach!

Next: Lesson 2c - Gemini Media Prompts →

Quick Reference

Minimal Working Example

import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const anthropic = new Anthropic();
const base64Image = fs.readFileSync("image.jpg", "base64");

const response = await anthropic.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 1000,
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image",
          source: {
            type: "base64",
            media_type: "image/jpeg",
            data: base64Image,
          },
        },
      ],
    },
  ],
});

const text = response.content
  .filter((b) => b.type === "text")
  .map((b) => b.text)
  .join("\n");

console.log(text);

Common Pitfalls

❌ Including data:image/jpeg;base64, prefix (don't!)
❌ Forgetting media_type field
❌ Images over 5MB (compress first!)
❌ Forgetting to filter content blocks
❌ Omitting required max_tokens

Completed Lesson 2b! You can now analyze images with Anthropic Claude. 🎉

Module 3 - Lesson 2b: Anthropic Media Prompts