Safety, Refusals & Fallbacks

Intermediate

In production, your code must handle the case where Claude won't (or can't) answer as expected. Done well, this is invisible to users; done badly, it's a crash or a confusing reply.

Two different things

A model refusal — Claude declines a request (e.g. it judges it harmful). The response signals this (commonly via a refusal stop_reason/content). Treat it as a normal outcome, not an error.
A classifier/safety block — a separate safety layer may block content. This can look different from a model refusal.

Knowing which you got lets you respond appropriately rather than retrying blindly.

Handle it gracefully

resp = client.messages.create(...)
if getattr(resp, "stop_reason", None) == "refusal":
    # Don't show a raw/empty result. Offer a safe fallback or a clarifying ask.
    show_user("I can't help with that as asked. Here's what I can do instead…")
else:
    render(resp)

Reduce unwanted refusals

Add legitimate context. A request can pattern-match to something sensitive when intent is benign; stating the real, legitimate purpose helps.
Be specific. Vague or edgy phrasing invites caution.
Don't fight it. If a request is genuinely disallowed, refusal is correct — design a graceful path, don't try to jailbreak.

Fallback patterns

A clarifying question instead of a dead end.
A safe alternative ("I can summarize the public info instead").
For pipelines, route to a human when confidence/eligibility is low.

Two different things​

Handle it gracefully​

Reduce unwanted refusals​

Fallback patterns​

Next​

Two different things

Handle it gracefully

Reduce unwanted refusals

Fallback patterns

Next