Errors, Rate Limits & Reliability

Intermediate

Production code talks to a network service, so it must expect failure. A little structure here is the difference between a flaky integration and a dependable one.

The error map

Typical HTTP statuses you'll handle:

Status	Meaning	What to do
400	Invalid request	Fix the payload; don't retry as-is
401	Bad/missing API key	Check credentials
403	Not permitted	Check access/permissions
429	Rate limited	Back off and retry (respect `retry-after`)
500/529	Server error / overloaded	Retry with backoff

The SDKs surface these as typed exceptions, so you can branch cleanly instead of parsing strings.

Retries with backoff

For transient errors (429, 5xx), retry with exponential backoff + jitter, capped:

import time, random
for attempt in range(5):
    try:
        return client.messages.create(...)
    except (RateLimitError, APIStatusError) as e:
        if attempt == 4 or not should_retry(e):
            raise
        time.sleep(min(2 ** attempt + random.random(), 30))

(Many SDKs retry transient errors automatically — know your client's default before adding your own.)

Rate limits

Limits apply per-account/tier (requests and tokens per minute). When you hit one you get 429 with timing hints. Strategies: respect retry-after, smooth bursts, batch offline work, and use a cheaper model (Choosing a Model) for high-volume steps.

Model migration

Model IDs are dated/versioned and get deprecated. Insulate yourself:

Read the model ID from config, not scattered literals.
Watch deprecations — see Deprecation & Migration Watch and the models table.
Re-run your evals when you switch models.

The error map​

Retries with backoff​

Rate limits​

Model migration​

Next​

The error map

Retries with backoff

Rate limits

Model migration

Next