Module 2 (Gemini) - Lesson 2e: Streaming with Gemini

Stream Gemini responses with generateContentStream for responsive UIs.

Published: 1/15/2026

Lesson 2e: Streaming Responses with Google Gemini

Learn how to stream Gemini's responses in real-time for better user experience. Gemini uses an async iterator approach similar to OpenAI, different from Anthropic's event handlers.

Key Differences from OpenAI and Anthropic

OpenAI: Uses async iterator with for await

const stream = await openai.responses.stream({
  model: "gpt-5-nano",
  input: "Hello"
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta || '');
}

Anthropic: Uses .stream() method with event handlers

const response = await anthropic.messages
  .stream({ ... })
  .on("text", (text) => {
    process.stdout.write(text);
  });

Gemini: Uses generateContentStream with async iterator

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

for await (const chunk of response) {
  process.stdout.write(chunk.text || '');
}

Code Example

Create src/gemini/stream-prompt.ts:

import { GoogleGenAI, ApiError } from "@google/genai";
import dotenv from "dotenv";

// Load environment variables
dotenv.config();

// Create Gemini client
const gemini = new GoogleGenAI({});

// Async function with proper return type
async function streamPrompt(): Promise<void> {
  try {
    console.log("Testing Gemini connection...");
    console.log("---------Streaming event data start-------");

    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: {
        role: "user",
        parts: [
          {
            text: "Suggest a travel destination within Europe where there is a Christmas market that is famous but is not in a big city. I would like to go somewhere that is less than 2 hours from a major airport and has good public transport links.",
          },
        ],
      },
      config: {
        systemInstruction:
          "You are a helpful travel assistant. Provide detailed travel suggestions based on user preferences and include distance from the airport.",
      },
    });

    let finalUsage = null;
    let fullText = "";

    // Iterate over streamed chunks
    for await (const chunk of response) {
      // Store usage metadata from last chunk
      if (chunk.usageMetadata) {
        finalUsage = chunk.usageMetadata;
      }

      // Write text as it arrives
      if (chunk.text) {
        fullText += chunk.text;
        process.stdout.write(chunk.text);
      }
    }

    console.log("\n\n---------Streaming event data end-------");
    console.log("Stream Prompt Success!");
    console.log("Tokens used:");
    console.dir(finalUsage, { depth: null });
  } catch (error) {
    if (error instanceof ApiError) {
      console.log("API Error:", error.status, error.message);
    } else if (error instanceof Error) {
      console.log("Error:", error.message);
    } else {
      console.log("Unknown error occurred");
    }
  }
}

// Run the test
streamPrompt().catch((error) => {
  console.error("Error:", error);
});

Run It

pnpm tsx src/gemini/stream-prompt.ts

You'll see the response appear word-by-word in real-time!


Understanding Gemini Streaming

The Async Iterator Pattern

Gemini uses the standard JavaScript async iterator pattern:

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Tell me a story"
});

for await (const chunk of response) {
  console.log(chunk.text);
}

Chunk Structure

Each streamed chunk contains:

{
  text: "partial response...",  // Text portion
  candidates: [{
    content: {
      parts: [{ text: "partial response..." }],
      role: "model"
    },
    finishReason: null  // Only set on final chunk
  }],
  usageMetadata: {  // Only on final chunk
    promptTokenCount: 10,
    candidatesTokenCount: 50,
    totalTokenCount: 60
  }
}

Accessing Chunk Data

for await (const chunk of response) {
  // Direct text access
  const text = chunk.text;

  // Or via candidates
  const text2 = chunk.candidates?.[0]?.content?.parts?.[0]?.text;

  // Check for completion
  const isComplete = chunk.candidates?.[0]?.finishReason === "STOP";

  // Usage (only on final chunk)
  const usage = chunk.usageMetadata;
}

Practical Examples

1. Chat Interface

async function chatWithStreaming(userMessage: string) {
  let fullResponse = "";

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: userMessage
  });

  for await (const chunk of response) {
    if (chunk.text) {
      fullResponse += chunk.text;
      process.stdout.write(chunk.text);  // Display in real-time
    }
  }

  return fullResponse;
}

// Usage
const response = await chatWithStreaming("What's the weather like in Paris?");

2. Progress Indicator

async function streamWithProgress(prompt: string) {
  let wordCount = 0;
  let charCount = 0;

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: prompt
  });

  for await (const chunk of response) {
    if (chunk.text) {
      charCount += chunk.text.length;
      wordCount += chunk.text.split(/\s+/).filter(Boolean).length;
      process.stdout.write(chunk.text);

      // Show progress every 50 characters
      if (charCount % 50 === 0) {
        console.log(`\n[${wordCount} words, ${charCount} chars]`);
      }
    }
  }
}

3. Accumulate and Process

async function streamAndAccumulate(prompt: string) {
  const chunks: string[] = [];
  let usage = null;

  const response = await gemini.models.generateContentStream({
    model: "gemini-3-flash-preview",
    contents: prompt
  });

  for await (const chunk of response) {
    if (chunk.text) {
      chunks.push(chunk.text);
      process.stdout.write(chunk.text);
    }
    if (chunk.usageMetadata) {
      usage = chunk.usageMetadata;
    }
  }

  return {
    chunks,              // Individual pieces
    full: chunks.join(""), // Complete text
    usage               // Token usage
  };
}

4. With Timeout/Cancellation

async function streamWithTimeout(prompt: string, timeoutMs: number) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: prompt
    });

    let result = "";
    for await (const chunk of response) {
      if (controller.signal.aborted) break;
      if (chunk.text) {
        result += chunk.text;
        process.stdout.write(chunk.text);
      }
    }
    return result;
  } finally {
    clearTimeout(timeout);
  }
}

Error Handling with Streams

Always handle streaming errors properly:

async function streamWithErrorHandling() {
  try {
    const response = await gemini.models.generateContentStream({
      model: "gemini-3-flash-preview",
      contents: "Tell me a story"
    });

    for await (const chunk of response) {
      if (chunk.text) {
        process.stdout.write(chunk.text);
      }
    }
  } catch (error) {
    if (error instanceof ApiError) {
      console.error("API Error:", error.status, error.message);

      // Handle specific errors
      if (error.status === 429) {
        console.log("Rate limited - please wait and retry");
      } else if (error.status === 400) {
        console.log("Invalid request - check your parameters");
      }
    } else if (error instanceof Error) {
      console.error("Error:", error.message);
    }
  }
}

Provider Streaming Comparison

OpenAI Approach

const stream = await openai.responses.stream({
  model: "gpt-5-nano",
  input: "Hello"
});

for await (const chunk of stream) {
  const content = chunk.delta || "";
  process.stdout.write(content);
}

// Get final response
const final = await stream.finalResponse();

Anthropic Approach

const stream = await anthropic.messages
  .stream({
    model: "claude-haiku-4-5",
    max_tokens: 1000,
    messages: [{ role: "user", content: "Hello" }]
  })
  .on("text", (text) => {
    process.stdout.write(text);
  });

const final = await stream.finalMessage();

Gemini Approach

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

let usage = null;
for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
}
// Usage is in the last chunk

Key Differences Table

FeatureOpenAIAnthropicGemini
Method.stream().stream()generateContentStream()
Iterationfor await loopEvent handlersfor await loop
Text accesschunk.deltaCallback paramchunk.text
Final message.finalResponse().finalMessage()Last chunk
Usage statsIn final responseIn final messageIn last chunk

When to Use Streaming

Use Streaming For:

  • Chat interfaces - Show responses as they generate
  • Long responses - Improve perceived performance
  • Real-time UIs - Interactive applications
  • User engagement - Keep users engaged during generation
  • Time-to-first-token - Critical latency requirements

Don't Use Streaming For:

  • Batch processing - No user watching
  • API responses - Wait for complete response
  • Complex parsing - Easier with full text
  • Caching - Cache complete responses
  • Short responses - Overhead not worth it

Best Practices

1. Always Store Usage Metadata

let usage = null;
for await (const chunk of response) {
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
  // ... process text
}
console.log("Tokens used:", usage?.totalTokenCount);

2. Handle Empty Chunks

for await (const chunk of response) {
  if (chunk.text) {  // Only process if text exists
    process.stdout.write(chunk.text);
  }
}

3. Track Completion State

let isComplete = false;
for await (const chunk of response) {
  if (chunk.candidates?.[0]?.finishReason === "STOP") {
    isComplete = true;
  }
  // ... process
}

4. Consider User Experience

  • Show a loading indicator before streaming starts
  • Display a cursor or typing indicator
  • Allow cancellation for long responses
  • Show progress for very long generations
console.log("Thinking...");
const response = await gemini.models.generateContentStream({...});
console.log("\nResponse:");
for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
}
console.log("\n[Complete]");

Advanced: Multi-Part Streaming

Stream with structured content:

const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: [
    {
      role: "user",
      parts: [
        { text: "Here's a document to analyze:" },
        { text: longDocument },
        { text: "Please summarize the key points." }
      ]
    }
  ],
  config: {
    systemInstruction: "You are a document analyst.",
    maxOutputTokens: 2000
  }
});

for await (const chunk of response) {
  process.stdout.write(chunk.text || "");
}

Key Takeaways

  • Use generateContentStream() for streaming responses
  • Uses for await...of pattern (similar to OpenAI, different from Anthropic)
  • Each chunk contains text and potentially usageMetadata
  • Usage stats only available in the final chunk
  • Great for chat interfaces and long responses
  • Always handle errors properly in the loop

Next Steps

Learn how to get structured, validated JSON output from Gemini!

Next: Lesson 2f - Structured Output


Quick Reference

// Basic streaming
const response = await gemini.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Hello"
});

let usage = null;
for await (const chunk of response) {
  if (chunk.text) process.stdout.write(chunk.text);
  if (chunk.usageMetadata) usage = chunk.usageMetadata;
}

console.log("\nTokens:", usage?.totalTokenCount);

Common Pitfalls

  1. Forgetting to check if chunk.text exists
  2. Not storing usageMetadata from the last chunk
  3. Using generateContent instead of generateContentStream
  4. Not handling errors inside the async iteration