LLM Streaming Response Handler
Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.
When to Use
✅ Use for:
Chat interfaces with typing animation Real-time AI assistants Code generation with live preview Document summarization with progressive display Any UI where users expect immediate feedback from LLMs
❌ NOT for:
Batch document processing (no user watching) APIs that don't support streaming WebSocket-based bidirectional chat (use Socket.IO) Simple request/response (fetch is fine) Quick Decision Tree Does your LLM interaction: ├── Need immediate visual feedback? → Streaming ├── Display long-form content (>100 words)? → Streaming ├── User expects typewriter effect? → Streaming ├── Short response (<50 words)? → Regular fetch └── Background processing? → Regular fetch
Technology Selection Server-Sent Events (SSE) - Recommended
Why SSE over WebSockets for LLM streaming:
Simplicity: HTTP-based, works with existing infrastructure Auto-reconnect: Built-in reconnection logic Firewall-friendly: Easier than WebSockets through proxies One-way perfect: LLMs only stream server → client
Timeline:
2015-2020: WebSockets for everything 2020: SSE adoption for streaming APIs 2023+: SSE standard for LLM streaming (OpenAI, Anthropic) 2024: Vercel AI SDK popularizes SSE patterns Streaming APIs Provider Streaming Method Response Format OpenAI SSE data: {"choices":[{"delta":{"content":"token"}}]} Anthropic SSE data: {"type":"content_block_delta","delta":{"text":"token"}} Claude (API) SSE data: {"delta":{"text":"token"}} Vercel AI SDK SSE Normalized across providers Common Anti-Patterns Anti-Pattern 1: Buffering Before Display
Novice thinking: "Collect all tokens, then show complete response"
Problem: Defeats the entire purpose of streaming.
Wrong approach:
// ❌ Waits for entire response before showing anything const response = await fetch('/api/chat', { method: 'POST', body: prompt }); const fullText = await response.text(); setMessage(fullText); // User sees nothing until done
Correct approach:
// ✅ Display tokens as they arrive const response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ prompt }) });
const reader = response.body.getReader(); const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); setMessage(prev => prev + data.content); // Update immediately } } }
Timeline:
Pre-2023: Many apps buffered entire response 2023+: Token-by-token display expected Anti-Pattern 2: No Stream Cancellation
Problem: User can't stop generation, wasting tokens and money.
Symptom: "Stop" button doesn't work or doesn't exist.
Correct approach:
// ✅ AbortController for cancellation
const [abortController, setAbortController] = useState
const streamResponse = async () => { const controller = new AbortController(); setAbortController(controller);
try { const response = await fetch('/api/chat', { signal: controller.signal, method: 'POST', body: JSON.stringify({ prompt }) });
// Stream handling...
} catch (error) { if (error.name === 'AbortError') { console.log('Stream cancelled by user'); } } finally { setAbortController(null); } };
const cancelStream = () => { abortController?.abort(); };
return ( );
Anti-Pattern 3: No Error Recovery
Problem: Stream fails mid-response, user sees partial text with no indication of failure.
Correct approach:
// ✅ Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState
try { setStreamState('streaming');
// Streaming logic...
setStreamState('complete'); } catch (error) { setStreamState('error');
if (error.name === 'AbortError') { setErrorMessage('Generation stopped'); } else if (error.message.includes('429')) { setErrorMessage('Rate limit exceeded. Try again in a moment.'); } else { setErrorMessage('Something went wrong. Please retry.'); } }
// UI feedback
{streamState === 'error' && (
)}
Anti-Pattern 4: Memory Leaks from Unclosed Streams
Problem: Streams not cleaned up, causing memory leaks.
Symptom: Browser slows down after multiple requests.
Correct approach:
// ✅ Cleanup with useEffect useEffect(() => { let reader: ReadableStreamDefaultReader | null = null;
const streamResponse = async () => { const response = await fetch('/api/chat', { ... }); reader = response.body.getReader();
// Streaming...
};
streamResponse();
// Cleanup on unmount return () => { reader?.cancel(); }; }, [prompt]);
Anti-Pattern 5: No Typing Indicator Between Tokens
Problem: UI feels frozen between slow tokens.
Correct approach:
// ✅ Animated cursor during generation
.typing-cursor { animation: blink 1s step-end infinite; }
@keyframes blink { 50% { opacity: 0; } }
Implementation Patterns Pattern 1: Basic SSE Stream Handler async function* streamCompletion(prompt: string) { const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }) });
const reader = response.body!.getReader(); const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
yield data.content;
}
if (data.done) {
return;
}
}
}
} }
// Usage for await (const token of streamCompletion('Hello')) { console.log(token); }
Pattern 2: React Hook for Streaming import { useState, useCallback } from 'react';
interface UseStreamingOptions { onToken?: (token: string) => void; onComplete?: (fullText: string) => void; onError?: (error: Error) => void; }
export function useStreaming(options: UseStreamingOptions = {}) {
const [content, setContent] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const [error, setError] = useState
const stream = useCallback(async (prompt: string) => { const controller = new AbortController(); setAbortController(controller); setIsStreaming(true); setError(null); setContent('');
try {
const response = await fetch('/api/chat', {
method: 'POST',
signal: controller.signal,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
accumulated += data.content;
setContent(accumulated);
options.onToken?.(data.content);
}
}
}
}
options.onComplete?.(accumulated);
} catch (err) {
if (err.name !== 'AbortError') {
setError(err as Error);
options.onError?.(err as Error);
}
} finally {
setIsStreaming(false);
setAbortController(null);
}
}, [options]);
const cancel = useCallback(() => { abortController?.abort(); }, [abortController]);
return { content, isStreaming, error, stream, cancel }; }
// Usage in component function ChatInterface() { const { content, isStreaming, stream, cancel } = useStreaming({ onToken: (token) => console.log('New token:', token), onComplete: (text) => console.log('Done:', text) });
return (
<button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
Generate
</button>
{isStreaming && <button onClick={cancel}>Stop</button>}
</div>
); }
Pattern 3: Server-Side Streaming (Next.js) // app/api/chat/route.ts import { OpenAI } from 'openai';
export const runtime = 'edge'; // Required for streaming
export async function POST(req: Request) { const { prompt } = await req.json();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }], stream: true });
// Convert OpenAI stream to SSE format const encoder = new TextEncoder();
const readable = new ReadableStream({ async start(controller) { try { for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content;
if (content) {
const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
controller.enqueue(encoder.encode(sseMessage));
}
}
// Send completion signal
controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
controller.close();
} catch (error) {
controller.error(error);
}
}
});
return new Response(readable, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } }); }
Production Checklist □ AbortController for cancellation □ Error states with retry capability □ Typing indicator during generation □ Cleanup on component unmount □ Rate limiting on API route □ Token usage tracking □ Streaming fallback (if API fails) □ Accessibility (screen reader announces updates) □ Mobile-friendly (touch targets for stop button) □ Network error recovery (auto-retry on disconnect) □ Max response length enforcement □ Cost estimation before generation
When to Use vs Avoid Scenario Use Streaming? Chat interface ✅ Yes Long-form content generation ✅ Yes Code generation with preview ✅ Yes Short completions (<50 words) ❌ No - regular fetch Background jobs ❌ No - use job queue Bidirectional chat ⚠️ Use WebSockets instead Technology Comparison Feature SSE WebSockets Long Polling Complexity Low Medium High Auto-reconnect ✅ ❌ ❌ Bidirectional ❌ ✅ ❌ Firewall-friendly ✅ ⚠️ ✅ Browser support ✅ All modern ✅ All modern ✅ Universal LLM API support ✅ Standard ❌ Rare ❌ Not used References /references/sse-protocol.md - Server-Sent Events specification details /references/vercel-ai-sdk.md - Vercel AI SDK integration patterns /references/error-recovery.md - Stream error handling strategies Scripts scripts/stream_tester.ts - Test SSE endpoints locally scripts/token_counter.ts - Estimate costs before generation
This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display
← 返回排行榜