Sampling Controls: Temperature & Friends
When a model generates text, it picks the next token from a probability distribution. Sampling controls tune how it picks — how focused vs. how varied the output is.
The main dials
- Temperature — randomness. Low (≈0) = focused, deterministic-ish, repeatable; the model takes the most likely path. High = more varied and creative, but more prone to wandering or error.
- top-p (nucleus) — restrict choices to the smallest set of tokens whose probabilities sum to p. A different way to bound randomness.
- top-k — only consider the k most likely tokens.
- stop sequences — strings that, when generated, end the response (handy for structured output).
You usually adjust temperature OR top-p, not both.
When to run cold vs hot
| Run cold (low temp) | Run hot (higher temp) |
|---|---|
| Extraction, classification, code | Brainstorming, names, creative copy |
| Anything you want reproducible | Exploring many options |
| Factual / structured output | Tone variety, ideation |
A good default for most work is moderate-to-low. Crank it up only when you want surprise.
:::note Newer models may hide these Several recent Claude models adapt their own decoding and de-emphasize (or omit) temperature. If a knob isn't available, that's by design — shape behavior through the prompt and (where offered) the effort/thinking setting instead. :::
Determinism caveat
Even at temperature 0, outputs aren't guaranteed bit-identical across runs/versions. Don't rely on exact reproducibility; rely on evals to catch drift.