Streaming & Multi-Turn Conversations
Two practical realities of building chat-like experiences on the API: stream so users see output immediately, and manage history yourself because the API is stateless.
Streaming
Without streaming, the user waits for the whole reply. With streaming, tokens arrive as they're generated — far better perceived speed. Use the SDK's streaming helper:
- Python
- TypeScript
with client.messages.stream(
model="claude-sonnet-4-6", max_tokens=1024,
messages=[{"role": "user", "content": "Explain RAG in two sentences."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
const stream = client.messages.stream({
model: "claude-sonnet-4-6", max_tokens: 1024,
messages: [{ role: "user", content: "Explain RAG in two sentences." }],
});
for await (const event of stream) {
if (event.type === "content_block_delta") process.stdout.write(event.delta.text ?? "");
}
Multi-turn: you hold the history
The API has no memory between calls (why). To continue a conversation, send the whole prior exchange back each time:
messages = [{"role": "user", "content": "Hi, I'm planning a trip."}]
# ... get assistant reply, then append both turns:
messages.append({"role": "assistant", "content": assistant_text})
messages.append({"role": "user", "content": "Make it 3 days."})
# send the full `messages` list again
Long conversations fill the window
As history grows it eats the context window and cost rises. Strategies:
- Summarize/compact older turns into a short recap you carry forward.
- Trim irrelevant earlier turns.
- Pair with prompt caching to avoid re-paying for a stable prefix.