Search⌘ K
AI Features

Sampling Strategies

Explore how sampling strategies influence the token selection process in generative AI. Understand beam search, top-k, and nucleus sampling methods, their trade-offs between determinism and diversity, and their practical use cases to optimize text generation for different applications.

When a language model generates text, it does not directly produce a single next word. Instead, at each step, it outputs a probability distribution over all tokens in its vocabulary. This distribution reflects how likely the model believes each token is, given the text generated so far.

For example, consider the prompt:

“The capital of France is”

After processing this prompt, the model might produce a probability distribution like the following:

Probability distribution of tokens

The model has not committed to a single answer. It has expressed uncertainty by assigning probabilities to multiple possibilities. The question is: how do we turn this distribution into an actual token choice?

This decision is handled by the sampling strategy.

Greedy decoding

The most straightforward approach is greedy decoding. At each step, the model selects the token with the highest probability.

In the example above, greedy decoding would always select: “Paris.”

Greedy decoding is deterministic. Given the same prompt and model, it will always produce the same output. While this can be useful for debugging or tasks where variability is undesirable, it has important limitations.

Why greedy decoding is not enough

Greedy decoding works well for short, factual completions, but it often performs poorly for longer or more open-ended generation.

Consider a storytelling prompt:

“Once upon a time, there was a brave knight who”

At each step, the most probable token is often a common, safe continuation. Over many steps, this leads to outputs that are repetitive, generic, or overly cautious. The model tends to follow high-probability paths that quickly converge on predictable phrasing.

For example, greedy decoding may repeatedly favor tokens like:

  • “was”

  • “had”

  • “the”

This can result in text that feels dull or stuck in loops.

Controlled randomness

Sampling strategies address this issue by introducing controlled randomness into the selection of tokens. Instead of always choosing the most probable token, the model is allowed to sample from the distribution in a structured way.

Returning to the earlier example:

Token

Probability

Paris

0.72

Lyon

0.12

Marseille

0.07

London

0.03

A sampling-based approach might still choose “Paris” most of the time, but it allows lower-probability tokens like “Lyon” or “Marseille” to be selected occasionally. This variability becomes especially important when generating longer sequences, where early choices strongly influence later ones.

All sampling strategies navigate the same fundamental trade-off:

  • Determinism vs. diversity

  • Coherence vs. creativity

  • Safety vs. exploration

Different strategies resolve this trade-off in different ways. Some prioritize the most likely sequences, while others deliberately allow for variation. In the following sections, we will examine three commonly used approaches, beam search, top-k sampling, and nucleus (top-p) sampling, and see how each one makes this decision differently.

Beam search

Beam search is a decoding strategy that produces more reliable, coherent outputs by simultaneously exploring multiple possible continuations. Instead of committing to a single token choice at each step, beam search keeps track of several promising partial sequences and expands them in parallel.

The key idea is simple: do not put all your probability mass on a single path too early.

The core idea behind beam search

Beam search maintains a fixed number of candidate sequences, called the beam width, usually denoted as BBB.

At each generation step:

  1. Every sequence in the beam is expanded by one token.

  2. All expanded sequences are scored using their cumulative probability.

  3. Only the top B sequences are kept.

  4. The rest are discarded.

This process repeats ...