Skip to main content

Your First Production API Call (Cost-Aware)

Intermediate

A toy API call is one line. A production call handles errors, streams output, watches cost, and keeps secrets safe. Let's build that, step by step.

Step 1 — Secrets & model from config

export ANTHROPIC_API_KEY="sk-ant-..." # never in source control

Keep the model ID in config, not scattered literals, so migration is trivial (why). Pick it deliberately — Choosing a Model.

Step 2 — A resilient, streamed call

import os, time, random, anthropic
client = anthropic.Anthropic()
MODEL = os.environ.get("CLAUDE_MODEL", "claude-sonnet-4-6")

def ask_stream(prompt, system=None, max_tokens=1024):
for attempt in range(5):
try:
with client.messages.stream(
model=MODEL, max_tokens=max_tokens,
system=system or anthropic.NOT_GIVEN,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print()
usage = final.usage
print(f"\n[tokens in/out: {usage.input_tokens}/{usage.output_tokens}]")
return final
except (anthropic.RateLimitError, anthropic.APIStatusError):
if attempt == 4: raise
time.sleep(min(2 ** attempt + random.random(), 30))

Step 3 — Mind the cost

  • Log token usage (above) so you can see what each call costs.
  • Right-size max_tokens and the model; cap input with focused prompts.
  • For repeated stable prefixes, add prompt caching.
  • See Tokens & Pricing and Cost & Latency.

Step 4 — Handle the unhappy paths

  • Retry transient errors (429/5xx) with backoff (above); don't retry 400s.
  • Handle refusals gracefully.
  • Set a timeout and a cost/iteration budget for anything agentic.

Verify

Run it: you should see streamed output, a token-usage line, and graceful behavior if you force an error (e.g. a bad key → clean message, not a crash).

Next