Module 2 (Gemini) - Lesson 2e: Streaming with Gemini
Stream Gemini responses with generateContentStream for responsive UIs.
Published: 1/15/2026
Lesson 2e: Streaming Responses with Google Gemini
Learn how to stream Gemini's responses in real-time for better user experience. Gemini uses an async iterator approach similar to OpenAI, different from Anthropic's event handlers.
Key Differences from OpenAI and Anthropic
OpenAI: Uses async iterator with for await
const stream = await openai.responses.stream({ model: "gpt-5-nano", input: "Hello" }); for await (const chunk of stream) { process.stdout.write(chunk.delta || ''); }
Anthropic: Uses .stream() method with event handlers
const response = await anthropic.messages .stream({ ... }) .on("text", (text) => { process.stdout.write(text); });
Gemini: Uses generateContentStream with async iterator
const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: "Hello" }); for await (const chunk of response) { process.stdout.write(chunk.text || ''); }
Code Example
Create src/gemini/stream-prompt.ts:
import { GoogleGenAI, ApiError } from "@google/genai"; import dotenv from "dotenv"; // Load environment variables dotenv.config(); // Create Gemini client const gemini = new GoogleGenAI({}); // Async function with proper return type async function streamPrompt(): Promise<void> { try { console.log("Testing Gemini connection..."); console.log("---------Streaming event data start-------"); const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: { role: "user", parts: [ { text: "Suggest a travel destination within Europe where there is a Christmas market that is famous but is not in a big city. I would like to go somewhere that is less than 2 hours from a major airport and has good public transport links.", }, ], }, config: { systemInstruction: "You are a helpful travel assistant. Provide detailed travel suggestions based on user preferences and include distance from the airport.", }, }); let finalUsage = null; let fullText = ""; // Iterate over streamed chunks for await (const chunk of response) { // Store usage metadata from last chunk if (chunk.usageMetadata) { finalUsage = chunk.usageMetadata; } // Write text as it arrives if (chunk.text) { fullText += chunk.text; process.stdout.write(chunk.text); } } console.log("\n\n---------Streaming event data end-------"); console.log("Stream Prompt Success!"); console.log("Tokens used:"); console.dir(finalUsage, { depth: null }); } catch (error) { if (error instanceof ApiError) { console.log("API Error:", error.status, error.message); } else if (error instanceof Error) { console.log("Error:", error.message); } else { console.log("Unknown error occurred"); } } } // Run the test streamPrompt().catch((error) => { console.error("Error:", error); });
Run It
pnpm tsx src/gemini/stream-prompt.ts
You'll see the response appear word-by-word in real-time!
Understanding Gemini Streaming
The Async Iterator Pattern
Gemini uses the standard JavaScript async iterator pattern:
const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: "Tell me a story" }); for await (const chunk of response) { console.log(chunk.text); }
Chunk Structure
Each streamed chunk contains:
{ text: "partial response...", // Text portion candidates: [{ content: { parts: [{ text: "partial response..." }], role: "model" }, finishReason: null // Only set on final chunk }], usageMetadata: { // Only on final chunk promptTokenCount: 10, candidatesTokenCount: 50, totalTokenCount: 60 } }
Accessing Chunk Data
for await (const chunk of response) { // Direct text access const text = chunk.text; // Or via candidates const text2 = chunk.candidates?.[0]?.content?.parts?.[0]?.text; // Check for completion const isComplete = chunk.candidates?.[0]?.finishReason === "STOP"; // Usage (only on final chunk) const usage = chunk.usageMetadata; }
Practical Examples
1. Chat Interface
async function chatWithStreaming(userMessage: string) { let fullResponse = ""; const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: userMessage }); for await (const chunk of response) { if (chunk.text) { fullResponse += chunk.text; process.stdout.write(chunk.text); // Display in real-time } } return fullResponse; } // Usage const response = await chatWithStreaming("What's the weather like in Paris?");
2. Progress Indicator
async function streamWithProgress(prompt: string) { let wordCount = 0; let charCount = 0; const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: prompt }); for await (const chunk of response) { if (chunk.text) { charCount += chunk.text.length; wordCount += chunk.text.split(/\s+/).filter(Boolean).length; process.stdout.write(chunk.text); // Show progress every 50 characters if (charCount % 50 === 0) { console.log(`\n[${wordCount} words, ${charCount} chars]`); } } } }
3. Accumulate and Process
async function streamAndAccumulate(prompt: string) { const chunks: string[] = []; let usage = null; const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: prompt }); for await (const chunk of response) { if (chunk.text) { chunks.push(chunk.text); process.stdout.write(chunk.text); } if (chunk.usageMetadata) { usage = chunk.usageMetadata; } } return { chunks, // Individual pieces full: chunks.join(""), // Complete text usage // Token usage }; }
4. With Timeout/Cancellation
async function streamWithTimeout(prompt: string, timeoutMs: number) { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), timeoutMs); try { const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: prompt }); let result = ""; for await (const chunk of response) { if (controller.signal.aborted) break; if (chunk.text) { result += chunk.text; process.stdout.write(chunk.text); } } return result; } finally { clearTimeout(timeout); } }
Error Handling with Streams
Always handle streaming errors properly:
async function streamWithErrorHandling() { try { const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: "Tell me a story" }); for await (const chunk of response) { if (chunk.text) { process.stdout.write(chunk.text); } } } catch (error) { if (error instanceof ApiError) { console.error("API Error:", error.status, error.message); // Handle specific errors if (error.status === 429) { console.log("Rate limited - please wait and retry"); } else if (error.status === 400) { console.log("Invalid request - check your parameters"); } } else if (error instanceof Error) { console.error("Error:", error.message); } } }
Provider Streaming Comparison
OpenAI Approach
const stream = await openai.responses.stream({ model: "gpt-5-nano", input: "Hello" }); for await (const chunk of stream) { const content = chunk.delta || ""; process.stdout.write(content); } // Get final response const final = await stream.finalResponse();
Anthropic Approach
const stream = await anthropic.messages .stream({ model: "claude-haiku-4-5", max_tokens: 1000, messages: [{ role: "user", content: "Hello" }] }) .on("text", (text) => { process.stdout.write(text); }); const final = await stream.finalMessage();
Gemini Approach
const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: "Hello" }); let usage = null; for await (const chunk of response) { process.stdout.write(chunk.text || ""); if (chunk.usageMetadata) usage = chunk.usageMetadata; } // Usage is in the last chunk
Key Differences Table
| Feature | OpenAI | Anthropic | Gemini |
|---|---|---|---|
| Method | .stream() | .stream() | generateContentStream() |
| Iteration | for await loop | Event handlers | for await loop |
| Text access | chunk.delta | Callback param | chunk.text |
| Final message | .finalResponse() | .finalMessage() | Last chunk |
| Usage stats | In final response | In final message | In last chunk |
When to Use Streaming
Use Streaming For:
- Chat interfaces - Show responses as they generate
- Long responses - Improve perceived performance
- Real-time UIs - Interactive applications
- User engagement - Keep users engaged during generation
- Time-to-first-token - Critical latency requirements
Don't Use Streaming For:
- Batch processing - No user watching
- API responses - Wait for complete response
- Complex parsing - Easier with full text
- Caching - Cache complete responses
- Short responses - Overhead not worth it
Best Practices
1. Always Store Usage Metadata
let usage = null; for await (const chunk of response) { if (chunk.usageMetadata) usage = chunk.usageMetadata; // ... process text } console.log("Tokens used:", usage?.totalTokenCount);
2. Handle Empty Chunks
for await (const chunk of response) { if (chunk.text) { // Only process if text exists process.stdout.write(chunk.text); } }
3. Track Completion State
let isComplete = false; for await (const chunk of response) { if (chunk.candidates?.[0]?.finishReason === "STOP") { isComplete = true; } // ... process }
4. Consider User Experience
- Show a loading indicator before streaming starts
- Display a cursor or typing indicator
- Allow cancellation for long responses
- Show progress for very long generations
console.log("Thinking..."); const response = await gemini.models.generateContentStream({...}); console.log("\nResponse:"); for await (const chunk of response) { process.stdout.write(chunk.text || ""); } console.log("\n[Complete]");
Advanced: Multi-Part Streaming
Stream with structured content:
const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: [ { role: "user", parts: [ { text: "Here's a document to analyze:" }, { text: longDocument }, { text: "Please summarize the key points." } ] } ], config: { systemInstruction: "You are a document analyst.", maxOutputTokens: 2000 } }); for await (const chunk of response) { process.stdout.write(chunk.text || ""); }
Key Takeaways
- Use
generateContentStream()for streaming responses - Uses
for await...ofpattern (similar to OpenAI, different from Anthropic) - Each chunk contains
textand potentiallyusageMetadata - Usage stats only available in the final chunk
- Great for chat interfaces and long responses
- Always handle errors properly in the loop
Next Steps
Learn how to get structured, validated JSON output from Gemini!
Next: Lesson 2f - Structured Output
Quick Reference
// Basic streaming const response = await gemini.models.generateContentStream({ model: "gemini-3-flash-preview", contents: "Hello" }); let usage = null; for await (const chunk of response) { if (chunk.text) process.stdout.write(chunk.text); if (chunk.usageMetadata) usage = chunk.usageMetadata; } console.log("\nTokens:", usage?.totalTokenCount);
Common Pitfalls
- Forgetting to check if
chunk.textexists - Not storing
usageMetadatafrom the last chunk - Using
generateContentinstead ofgenerateContentStream - Not handling errors inside the async iteration