Skip to main content

Safety, Refusals & Fallbacks

Intermediate

In production, your code must handle the case where Claude won't (or can't) answer as expected. Done well, this is invisible to users; done badly, it's a crash or a confusing reply.

Two different things

  • A model refusal — Claude declines a request (e.g. it judges it harmful). The response signals this (commonly via a refusal stop_reason/content). Treat it as a normal outcome, not an error.
  • A classifier/safety block — a separate safety layer may block content. This can look different from a model refusal.

Knowing which you got lets you respond appropriately rather than retrying blindly.

Handle it gracefully

resp = client.messages.create(...)
if getattr(resp, "stop_reason", None) == "refusal":
# Don't show a raw/empty result. Offer a safe fallback or a clarifying ask.
show_user("I can't help with that as asked. Here's what I can do instead…")
else:
render(resp)

Reduce unwanted refusals

  • Add legitimate context. A request can pattern-match to something sensitive when intent is benign; stating the real, legitimate purpose helps.
  • Be specific. Vague or edgy phrasing invites caution.
  • Don't fight it. If a request is genuinely disallowed, refusal is correct — design a graceful path, don't try to jailbreak.

Fallback patterns

  • A clarifying question instead of a dead end.
  • A safe alternative ("I can summarize the public info instead").
  • For pipelines, route to a human when confidence/eligibility is low.

Next