Skip to main content

Extended Thinking & Effort

Intermediate

For hard problems, Claude can spend extra compute thinking before it answers — improving accuracy on multi-step reasoning, tricky code, and math. You control roughly how much effort to spend.

The idea

  • Less thinking = faster, cheaper — fine for simple, well-specified tasks.
  • More thinking = better on genuinely hard problems, at higher latency/cost.

Newer models expose this as an effort control (and adapt thinking depth automatically); on those, you choose a tier rather than a raw token budget. Match the tier to the task.

Choosing depth

TaskSuggested effort
Formatting, extraction, simple Q&ALow
Everyday coding, drafting, analysisMedium
Hard debugging, tricky algorithms, careful proofsHigh

Don't default everything to maximum — you pay in latency and cost for thinking the task doesn't need. Start medium; raise it only where quality demands.

Turn it on (API)

On the Messages API, enable thinking with a thinking block and a token budget. The budget is drawn from the same output pool, so budget_tokens must be less than max_tokens. The reply comes back as a thinking block followed by the answer text.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Prove that 2^n > n^2 for all n >= 5."}],
)

for block in message.content:
if block.type == "thinking":
print("REASONING:", block.thinking)
elif block.type == "text":
print("ANSWER:", block.text)

A bigger budget buys more room to reason, not a guarantee Claude uses all of it — depth adapts to the problem. Set the budget like a ceiling, then tune down if latency hurts. Model IDs come from the models table; the exact parameter shape can differ on newer models, so confirm against the source linked above.

Practical notes

  • Extended thinking pairs well with chain-of-thought prompting — but on reasoning models you often don't need to ask for step-by-step; the thinking happens internally.
  • Thinking consumes tokens, which affects cost — budget accordingly.
  • For agents, more effort on the planning step and less on routine tool calls is a good split.

Next