Extended Thinking & Effort

Intermediate

For hard problems, Claude can spend extra compute thinking before it answers — improving accuracy on multi-step reasoning, tricky code, and math. You control roughly how much effort to spend.

The idea

Less thinking = faster, cheaper — fine for simple, well-specified tasks.
More thinking = better on genuinely hard problems, at higher latency/cost.

Newer models expose this as an effort control (and adapt thinking depth automatically); on those, you choose a tier rather than a raw token budget. Match the tier to the task.

Choosing depth

Task	Suggested effort
Formatting, extraction, simple Q&A	Low
Everyday coding, drafting, analysis	Medium
Hard debugging, tricky algorithms, careful proofs	High

Don't default everything to maximum — you pay in latency and cost for thinking the task doesn't need. Start medium; raise it only where quality demands.

Turn it on (API)

On the Messages API, enable thinking with a thinking block and a token budget. The budget is drawn from the same output pool, so budget_tokens must be less than max_tokens. The reply comes back as a thinking block followed by the answer text.

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Prove that 2^n > n^2 for all n >= 5."}],
)

for block in message.content:
    if block.type == "thinking":
        print("REASONING:", block.thinking)
    elif block.type == "text":
        print("ANSWER:", block.text)

A bigger budget buys more room to reason, not a guarantee Claude uses all of it — depth adapts to the problem. Set the budget like a ceiling, then tune down if latency hurts. Model IDs come from the models table; the exact parameter shape can differ on newer models, so confirm against the source linked above.

Practical notes

Extended thinking pairs well with chain-of-thought prompting — but on reasoning models you often don't need to ask for step-by-step; the thinking happens internally.
Thinking consumes tokens, which affects cost — budget accordingly.
For agents, more effort on the planning step and less on routine tool calls is a good split.

The idea​

Choosing depth​

Turn it on (API)​

Practical notes​

Next​

The idea

Choosing depth

Turn it on (API)

Practical notes

Next