Skip to main content
Intermediate

The Trust Ladder

"How much should I let the AI just do?" is the question behind almost every agent decision — Claude Code permissions, auto-approve settings, whether to let a script run unattended. People tend to answer it as a single on/off switch: either you babysit everything, or you let it loose.

Here's a lens AILmanac uses instead:

Autonomy isn't a switch, it's a ladder. You climb it one rung at a time, and the rung you stand on should be set by how bad a mistake would be — not by how much you trust the model.

The key insight is that the right amount of autonomy has almost nothing to do with how "smart" the AI is. It's about blast radius (how much damage a wrong action does) and reversibility (how easily you can undo it). A brilliant model doing an irreversible thing unsupervised is a worse setup than a mediocre model doing a reversible one.

The five rungs

A model we find useful is to think of five distinct rungs, from least to most autonomy:

RungWhat the AI doesWhen it's appropriateWhat makes it safe
1. Suggest onlyTells you what it would do; takes no actionHigh-stakes or irreversible work; a domain you don't yet trust it in; you're still learning what it's good atYou are the executor. Nothing happens without you doing it by hand.
2. Draft for reviewProduces the actual artifact (code, email, query) but stops before applying itThe output is concrete and you can eyeball it faster than you could write itA real human read before anything takes effect. A diff you actually look at, not skim.
3. Act on reversible thingsExecutes directly, but only on low-stakes, easily-undone actionsThe action has a clean undo: edits in version control, writes to a scratch branch, anything a single command rolls backReversibility is the guardrail. The cost of a mistake is "undo it," not "explain it to legal."
4. Act then reportDoes the work autonomously, then shows you exactly what it didRepetitive, well-scoped tasks where reviewing after is cheaper than gating beforeA complete, honest audit trail — a log, a diff, a summary — that you actually read afterward.
5. Act autonomously within guardrailsRuns unattended inside hard limitsNarrow, well-understood loops you've watched succeed many timesThe guardrails do the supervising. Hard boundaries the AI cannot cross, plus a kill switch.

How to use the ladder

Three rules make this practical:

Start one rung lower than feels necessary. It's cheap to climb a rung once you've watched something work; it's expensive to clean up after granting too much too soon. The first time you point an agent at a new kind of task, drop to Suggest or Draft even if you suspect it can handle more.

Set the rung by the worst case, not the average case. If a task is reversible 95% of the time but the other 5% touches production data, you set the rung for the 5%. The blast radius of the worst plausible action is your ceiling.

Climb per-task, not per-tool. The same AI can be at rung 4 for "format my code" and rung 1 for "delete records from the database," in the same session. The ladder is about the action, not a global trust setting you flip once.

Mapping it to Claude Code

Claude Code is a clean place to see the ladder in action, because its permission system is essentially a set of dials for choosing your rung:

  • Rungs 1–2 are the default cautious posture: Claude proposes edits and commands, and you approve each one. You're reviewing every diff before it lands.
  • Rung 3 is allowing specific reversible tool calls — file edits inside a git repo you can git restore, runs on a throwaway branch — while still gating anything destructive.
  • Rung 4 is allow-listing categories of safe actions so Claude proceeds without prompting on those, then reading the transcript and diffs afterward.
  • Rung 5 is fuller autonomy for a narrow, proven loop — and it's only safe when real guardrails are in place: scoped permissions, a constrained working directory, and the ability to stop it.

The mechanism that lets you climb safely is your CLAUDE.md. That file is where you write the guardrails down: what's always allowed, what must never happen, which paths are off-limits, when to stop and ask. Guardrails you only hold in your head don't constrain an agent — guardrails written into CLAUDE.md do. If you're not sure how to phrase them, the CLAUDE.md Generator gives you a structured starting point.

The honest summary: don't grant autonomy because the AI seems capable. Grant it because the action is reversible, the blast radius is small, and the guardrails are written down. Then climb one rung at a time as the evidence comes in.