Extended Thinking & Effort
For hard problems, Claude can spend extra compute thinking before it answers — improving accuracy on multi-step reasoning, tricky code, and math. You control roughly how much effort to spend.
The idea
- Less thinking = faster, cheaper — fine for simple, well-specified tasks.
- More thinking = better on genuinely hard problems, at higher latency/cost.
Newer models expose this as an effort control (and adapt thinking depth automatically); on those, you choose a tier rather than a raw token budget. Match the tier to the task.
Choosing depth
| Task | Suggested effort |
|---|---|
| Formatting, extraction, simple Q&A | Low |
| Everyday coding, drafting, analysis | Medium |
| Hard debugging, tricky algorithms, careful proofs | High |
Don't default everything to maximum — you pay in latency and cost for thinking the task doesn't need. Start medium; raise it only where quality demands.
Turn it on (API)
On the Messages API, enable thinking with a thinking block and a token budget. The budget is drawn from the same output pool, so budget_tokens must be less than max_tokens. The reply comes back as a thinking block followed by the answer text.
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Prove that 2^n > n^2 for all n >= 5."}],
)
for block in message.content:
if block.type == "thinking":
print("REASONING:", block.thinking)
elif block.type == "text":
print("ANSWER:", block.text)
A bigger budget buys more room to reason, not a guarantee Claude uses all of it — depth adapts to the problem. Set the budget like a ceiling, then tune down if latency hurts. Model IDs come from the models table; the exact parameter shape can differ on newer models, so confirm against the source linked above.
Practical notes
- Extended thinking pairs well with chain-of-thought prompting — but on reasoning models you often don't need to ask for step-by-step; the thinking happens internally.
- Thinking consumes tokens, which affects cost — budget accordingly.
- For agents, more effort on the planning step and less on routine tool calls is a good split.