Lesson 2e Streaming Prompt | Module 2 Gemini

Lesson 2e: Streaming Responses with Google Gemini

Learn how to stream Gemini's responses in real-time for better user experience. Gemini uses an async iterator approach similar to OpenAI, different from Anthropic's event handlers.

Key Differences from OpenAI and Anthropic

OpenAI: Uses async iterator with for await

const stream = await openai.responses.stream({
  model: "gpt-5-nano",
  input: "Hello"
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta || '');
}

Anthropic: Uses .stream() method with event handlers

const response = await anthropic.messages
  .stream({ ... })
  .on("text", (text) => {
    process.stdout.write(text);
  });

Gemini: Uses generateContentStream with async iterator

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

for await (const chunk of response) {
  process.stdout.write(chunk.text || '');
}

Code Example

Create src/gemini/stream-prompt.ts:

import { GoogleGenAI, ApiError } from "@google/genai";
import dotenv from "dotenv";

// Load environment variables
dotenv.config();

// Create Gemini client
const gemini = new GoogleGenAI({});

// Async function with proper return type
async function streamPrompt(): Promise<void> {
  try {
    console.log("Testing Gemini connection...");
    console.log("---------Streaming event data start-------");

    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: {
        role: "user",
        parts: [
          {
            text: "Suggest a travel destination within Europe where there is a Christmas market that is famous but is not in a big city. I would like to go somewhere that is less than 2 hours from a major airport and has good public transport links.",
          },
        ],
      },
      config: {
        systemInstruction:
          "You are a helpful travel assistant. Provide detailed travel suggestions based on user preferences and include distance from the airport.",
      },
    });

    let finalUsage = null;
    let fullText = "";

    // Iterate over streamed chunks
    for await (const chunk of response) {
      // Store usage metadata from last chunk
      if (chunk.usageMetadata) {
        finalUsage = chunk.usageMetadata;
      }

      // Write text as it arrives
      if (chunk.text) {
        fullText += chunk.text;
        process.stdout.write(chunk.text);
      }
    }

    console.log("\n\n---------Streaming event data end-------");
    console.log("Stream Prompt Success!");
    console.log("Tokens used:");
    console.dir(finalUsage, { depth: null });
  } catch (error) {
    if (error instanceof ApiError) {
      console.log("API Error:", error.status, error.message);
    } else if (error instanceof Error) {
      console.log("Error:", error.message);
    } else {
      console.log("Unknown error occurred");
    }
  }
}

// Run the test
streamPrompt().catch((error) => {
  console.error("Error:", error);
});

Run It

pnpm tsx src/gemini/stream-prompt.ts

You'll see the response appear word-by-word in real-time!

Understanding Gemini Streaming

The Async Iterator Pattern

Gemini uses the standard JavaScript async iterator pattern:

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Tell me a story"
});

for await (const chunk of response) {
  console.log(chunk.text);
}

Chunk Structure

Each streamed chunk contains:

{
  text: "partial response...",  // Text portion
  candidates: [{
    content: {
      parts: [{ text: "partial response..." }],
      role: "model"
    },
    finishReason: null  // Only set on final chunk
  }],
  usageMetadata: {  // Only on final chunk
    promptTokenCount: 10,
    candidatesTokenCount: 50,
    totalTokenCount: 60
  }
}

Accessing Chunk Data

for await (const chunk of response) {
  // Direct text access
  const text = chunk.text;

  // Or via candidates
  const text2 = chunk.candidates?.[0]?.content?.parts?.[0]?.text;

  // Check for completion
  const isComplete = chunk.candidates?.[0]?.finishReason === "STOP";

  // Usage (only on final chunk)
  const usage = chunk.usageMetadata;
}

Practical Examples

1. Chat Interface

async function chatWithStreaming(userMessage: string) {
  let fullResponse = "";

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: userMessage
  });

  for await (const chunk of response) {
    if (chunk.text) {
      fullResponse += chunk.text;
      process.stdout.write(chunk.text);  // Display in real-time
    }
  }

  return fullResponse;
}

// Usage
const response = await chatWithStreaming("What's the weather like in Paris?");

2. Progress Indicator

async function streamWithProgress(prompt: string) {
  let wordCount = 0;
  let charCount = 0;

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: prompt
  });

  for await (const chunk of response) {
    if (chunk.text) {
      charCount += chunk.text.length;
      wordCount += chunk.text.split(/\s+/).filter(Boolean).length;
      process.stdout.write(chunk.text);

      // Show progress every 50 characters
      if (charCount % 50 === 0) {
        console.log(`\n[${wordCount} words, ${charCount} chars]`);
      }
    }
  }
}

3. Accumulate and Process

async function streamAndAccumulate(prompt: string) {
  const chunks: string[] = [];
  let usage = null;

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: prompt
  });

  for await (const chunk of response) {
    if (chunk.text) {
      chunks.push(chunk.text);
      process.stdout.write(chunk.text);
    }
    if (chunk.usageMetadata) {
      usage = chunk.usageMetadata;
    }
  }

  return {
    chunks,              // Individual pieces
    full: chunks.join(""), // Complete text
    usage               // Token usage
  };
}

4. With Timeout/Cancellation

async function streamWithTimeout(prompt: string, timeoutMs: number) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: prompt
    });

    let result = "";
    for await (const chunk of response) {
      if (controller.signal.aborted) break;
      if (chunk.text) {
        result += chunk.text;
        process.stdout.write(chunk.text);
      }
    }
    return result;
  } finally {
    clearTimeout(timeout);
  }
}

Error Handling with Streams

Always handle streaming errors properly:

async function streamWithErrorHandling() {
  try {
    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: "Tell me a story"
    });

    for await (const chunk of response) {
      if (chunk.text) {
        process.stdout.write(chunk.text);
      }
    }
  } catch (error) {
    if (error instanceof ApiError) {
      console.error("API Error:", error.status, error.message);

      // Handle specific errors
      if (error.status === 429) {
        console.log("Rate limited - please wait and retry");
      } else if (error.status === 400) {
        console.log("Invalid request - check your parameters");
      }
    } else if (error instanceof Error) {
      console.error("Error:", error.message);
    }
  }
}

Provider Streaming Comparison

OpenAI Approach

const stream = await openai.responses.stream({
  model: "gpt-5-nano",
  input: "Hello"
});

for await (const chunk of stream) {
  const content = chunk.delta || "";
  process.stdout.write(content);
}

// Get final response
const final = await stream.finalResponse();

Anthropic Approach

const stream = await anthropic.messages
  .stream({
    model: "claude-haiku-4-5",
    max_tokens: 1000,
    messages: [{ role: "user", content: "Hello" }]
  })
  .on("text", (text) => {
    process.stdout.write(text);
  });

const final = await stream.finalMessage();

Gemini Approach

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

let usage = null;
for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
}
// Usage is in the last chunk

Key Differences Table

Feature	OpenAI	Anthropic	Gemini
Method	`.stream()`	`.stream()`	`generateContentStream()`
Iteration	`for await` loop	Event handlers	`for await` loop
Text access	`chunk.delta`	Callback param	`chunk.text`
Final message	`.finalResponse()`	`.finalMessage()`	Last chunk
Usage stats	In final response	In final message	In last chunk

When to Use Streaming

Use Streaming For:

Chat interfaces - Show responses as they generate
Long responses - Improve perceived performance
Real-time UIs - Interactive applications
User engagement - Keep users engaged during generation
Time-to-first-token - Critical latency requirements

Don't Use Streaming For:

Batch processing - No user watching
API responses - Wait for complete response
Complex parsing - Easier with full text
Caching - Cache complete responses
Short responses - Overhead not worth it

Best Practices

1. Always Store Usage Metadata

let usage = null;
for await (const chunk of response) {
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
  // ... process text
}
console.log("Tokens used:", usage?.totalTokenCount);

2. Handle Empty Chunks

for await (const chunk of response) {
  if (chunk.text) {  // Only process if text exists
    process.stdout.write(chunk.text);
  }
}

3. Track Completion State

let isComplete = false;
for await (const chunk of response) {
  if (chunk.candidates?.[0]?.finishReason === "STOP") {
    isComplete = true;
  }
  // ... process
}

4. Consider User Experience

Show a loading indicator before streaming starts
Display a cursor or typing indicator
Allow cancellation for long responses
Show progress for very long generations

console.log("Thinking...");
const response = await gemini.models.generateContentStream({...});
console.log("\nResponse:");
for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
}
console.log("\n[Complete]");

Advanced: Multi-Part Streaming

Stream with structured content:

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: [
    {
      role: "user",
      parts: [
        { text: "Here's a document to analyze:" },
        { text: longDocument },
        { text: "Please summarize the key points." }
      ]
    }
  ],
  config: {
    systemInstruction: "You are a document analyst.",
    maxOutputTokens: 2000
  }
});

for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
}

Key Takeaways

Use generateContentStream() for streaming responses
Uses for await...of pattern (similar to OpenAI, different from Anthropic)
Each chunk contains text and potentially usageMetadata
Usage stats only available in the final chunk
Great for chat interfaces and long responses
Always handle errors properly in the loop

Next Steps

Learn how to get structured, validated JSON output from Gemini!

Next: Lesson 2f - Structured Output

Quick Reference

// Basic streaming
const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

let usage = null;
for await (const chunk of response) {
  if (chunk.text) process.stdout.write(chunk.text);
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
}

console.log("\nTokens:", usage?.totalTokenCount);

Common Pitfalls

Forgetting to check if chunk.text exists
Not storing usageMetadata from the last chunk
Using generateContent instead of generateContentStream
Not handling errors inside the async iteration

Module 2 (Gemini) - Lesson 2e: Streaming with Gemini