Errors, Rate Limits & Reliability
Production code talks to a network service, so it must expect failure. A little structure here is the difference between a flaky integration and a dependable one.
The error map
Typical HTTP statuses you'll handle:
| Status | Meaning | What to do |
|---|---|---|
| 400 | Invalid request | Fix the payload; don't retry as-is |
| 401 | Bad/missing API key | Check credentials |
| 403 | Not permitted | Check access/permissions |
| 429 | Rate limited | Back off and retry (respect retry-after) |
| 500/529 | Server error / overloaded | Retry with backoff |
The SDKs surface these as typed exceptions, so you can branch cleanly instead of parsing strings.
Retries with backoff
For transient errors (429, 5xx), retry with exponential backoff + jitter, capped:
import time, random
for attempt in range(5):
try:
return client.messages.create(...)
except (RateLimitError, APIStatusError) as e:
if attempt == 4 or not should_retry(e):
raise
time.sleep(min(2 ** attempt + random.random(), 30))
(Many SDKs retry transient errors automatically — know your client's default before adding your own.)
Rate limits
Limits apply per-account/tier (requests and tokens per minute). When you hit one you get 429 with timing hints. Strategies: respect retry-after, smooth bursts, batch offline work, and use a cheaper model (Choosing a Model) for high-volume steps.
Model migration
Model IDs are dated/versioned and get deprecated. Insulate yourself:
- Read the model ID from config, not scattered literals.
- Watch deprecations — see Deprecation & Migration Watch and the models table.
- Re-run your evals when you switch models.