Skip to main content

Streaming & Multi-Turn Conversations

Intermediate

Two practical realities of building chat-like experiences on the API: stream so users see output immediately, and manage history yourself because the API is stateless.

Streaming

Without streaming, the user waits for the whole reply. With streaming, tokens arrive as they're generated — far better perceived speed. Use the SDK's streaming helper:

with client.messages.stream(
model="claude-sonnet-4-6", max_tokens=1024,
messages=[{"role": "user", "content": "Explain RAG in two sentences."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

Multi-turn: you hold the history

The API has no memory between calls (why). To continue a conversation, send the whole prior exchange back each time:

messages = [{"role": "user", "content": "Hi, I'm planning a trip."}]
# ... get assistant reply, then append both turns:
messages.append({"role": "assistant", "content": assistant_text})
messages.append({"role": "user", "content": "Make it 3 days."})
# send the full `messages` list again

Long conversations fill the window

As history grows it eats the context window and cost rises. Strategies:

  • Summarize/compact older turns into a short recap you carry forward.
  • Trim irrelevant earlier turns.
  • Pair with prompt caching to avoid re-paying for a stable prefix.

Next