Safety, Refusals & Fallbacks
In production, your code must handle the case where Claude won't (or can't) answer as expected. Done well, this is invisible to users; done badly, it's a crash or a confusing reply.
Two different things
- A model refusal — Claude declines a request (e.g. it judges it harmful). The response signals this (commonly via a refusal
stop_reason/content). Treat it as a normal outcome, not an error. - A classifier/safety block — a separate safety layer may block content. This can look different from a model refusal.
Knowing which you got lets you respond appropriately rather than retrying blindly.
Handle it gracefully
resp = client.messages.create(...)
if getattr(resp, "stop_reason", None) == "refusal":
# Don't show a raw/empty result. Offer a safe fallback or a clarifying ask.
show_user("I can't help with that as asked. Here's what I can do instead…")
else:
render(resp)
Reduce unwanted refusals
- Add legitimate context. A request can pattern-match to something sensitive when intent is benign; stating the real, legitimate purpose helps.
- Be specific. Vague or edgy phrasing invites caution.
- Don't fight it. If a request is genuinely disallowed, refusal is correct — design a graceful path, don't try to jailbreak.
Fallback patterns
- A clarifying question instead of a dead end.
- A safe alternative ("I can summarize the public info instead").
- For pipelines, route to a human when confidence/eligibility is low.