Skip to main content

Errors, Rate Limits & Reliability

Intermediate

Production code talks to a network service, so it must expect failure. A little structure here is the difference between a flaky integration and a dependable one.

The error map

Typical HTTP statuses you'll handle:

StatusMeaningWhat to do
400Invalid requestFix the payload; don't retry as-is
401Bad/missing API keyCheck credentials
403Not permittedCheck access/permissions
429Rate limitedBack off and retry (respect retry-after)
500/529Server error / overloadedRetry with backoff

The SDKs surface these as typed exceptions, so you can branch cleanly instead of parsing strings.

Retries with backoff

For transient errors (429, 5xx), retry with exponential backoff + jitter, capped:

import time, random
for attempt in range(5):
try:
return client.messages.create(...)
except (RateLimitError, APIStatusError) as e:
if attempt == 4 or not should_retry(e):
raise
time.sleep(min(2 ** attempt + random.random(), 30))

(Many SDKs retry transient errors automatically — know your client's default before adding your own.)

Rate limits

Limits apply per-account/tier (requests and tokens per minute). When you hit one you get 429 with timing hints. Strategies: respect retry-after, smooth bursts, batch offline work, and use a cheaper model (Choosing a Model) for high-volume steps.

Model migration

Model IDs are dated/versioned and get deprecated. Insulate yourself:

Next