Skip to main content

Sampling Controls: Temperature & Friends

Intermediate

When a model generates text, it picks the next token from a probability distribution. Sampling controls tune how it picks — how focused vs. how varied the output is.

The main dials

  • Temperature — randomness. Low (≈0) = focused, deterministic-ish, repeatable; the model takes the most likely path. High = more varied and creative, but more prone to wandering or error.
  • top-p (nucleus) — restrict choices to the smallest set of tokens whose probabilities sum to p. A different way to bound randomness.
  • top-k — only consider the k most likely tokens.
  • stop sequences — strings that, when generated, end the response (handy for structured output).

You usually adjust temperature OR top-p, not both.

When to run cold vs hot

Run cold (low temp)Run hot (higher temp)
Extraction, classification, codeBrainstorming, names, creative copy
Anything you want reproducibleExploring many options
Factual / structured outputTone variety, ideation

A good default for most work is moderate-to-low. Crank it up only when you want surprise.

:::note Newer models may hide these Several recent Claude models adapt their own decoding and de-emphasize (or omit) temperature. If a knob isn't available, that's by design — shape behavior through the prompt and (where offered) the effort/thinking setting instead. :::

Determinism caveat

Even at temperature 0, outputs aren't guaranteed bit-identical across runs/versions. Don't rely on exact reproducibility; rely on evals to catch drift.

Next