Sampling Controls: Temperature & Friends

Intermediate

When a model generates text, it picks the next token from a probability distribution. Sampling controls tune how it picks — how focused vs. how varied the output is.

The main dials

Temperature — randomness. Low (≈0) = focused, deterministic-ish, repeatable; the model takes the most likely path. High = more varied and creative, but more prone to wandering or error.
top-p (nucleus) — restrict choices to the smallest set of tokens whose probabilities sum to p. A different way to bound randomness.
top-k — only consider the k most likely tokens.
stop sequences — strings that, when generated, end the response (handy for structured output).

You usually adjust temperature OR top-p, not both.

When to run cold vs hot

Run cold (low temp)	Run hot (higher temp)
Extraction, classification, code	Brainstorming, names, creative copy
Anything you want reproducible	Exploring many options
Factual / structured output	Tone variety, ideation

A good default for most work is moderate-to-low. Crank it up only when you want surprise.

:::note Newer models may hide these Several recent Claude models adapt their own decoding and de-emphasize (or omit) temperature. If a knob isn't available, that's by design — shape behavior through the prompt and (where offered) the effort/thinking setting instead. :::

Determinism caveat

Even at temperature 0, outputs aren't guaranteed bit-identical across runs/versions. Don't rely on exact reproducibility; rely on evals to catch drift.

The main dials​

When to run cold vs hot​

Determinism caveat​

Next​

The main dials

When to run cold vs hot

Determinism caveat

Next