Sampling Strategies

Explore how sampling strategies influence the token selection process in generative AI. Understand beam search, top-k, and nucleus sampling methods, their trade-offs between determinism and diversity, and their practical use cases to optimize text generation for different applications.

The model has not committed to a single answer. It has expressed uncertainty by assigning probabilities to multiple possibilities. The question is: how do we turn this distribution into an actual token choice?

This decision is handled by the sampling strategy.

Greedy decoding

The most straightforward approach is greedy decoding. At each step, the model selects the token with the highest probability.

In the example above, greedy decoding would always select: “Paris.”

Greedy decoding is deterministic. Given the same prompt and model, it will always produce the same output. While this can be useful for debugging or tasks where variability is undesirable, it has important limitations.

Why greedy decoding is not enough

Greedy decoding works well for short, factual completions, but it often performs poorly for longer or more open-ended generation.

Consider a storytelling prompt:

“Once upon a time, there was a brave knight who”

At each step, the most probable token is often a common, safe continuation. Over many steps, this leads to outputs that are repetitive, generic, or overly cautious. The model tends to follow high-probability paths that quickly converge on predictable phrasing.

For example, greedy decoding may repeatedly favor tokens like:

“was”
“had”
“the”

This can result in text that feels dull or stuck in loops.

Controlled randomness

Sampling strategies address this issue by introducing controlled randomness into the selection of tokens. Instead of always choosing the most probable token, the model is allowed to sample from the distribution in a structured way.

Returning to the earlier example:

A sampling-based approach might still choose “Paris” most of the time, but it allows lower-probability tokens like “Lyon” or “Marseille” to be selected occasionally. This variability becomes especially important when generating longer sequences, where early choices strongly influence later ones.

All sampling strategies navigate the same fundamental trade-off:

Determinism vs. diversity
Coherence vs. creativity
Safety vs. exploration

Different strategies resolve this trade-off in different ways. Some prioritize the most likely sequences, while others deliberately allow for variation. In the following sections, we will examine three commonly used approaches, beam search, top-k sampling, and nucleus (top-p) sampling, and see how each one makes this decision differently.

Beam search

Beam search is a decoding strategy that produces more reliable, coherent outputs by simultaneously exploring multiple possible continuations. Instead of committing to a single token choice at each step, beam search keeps track of several promising partial sequences and expands them in parallel.

The key idea is simple: do not put all your probability mass on a single path too early.

The core idea behind beam search

Beam search maintains a fixed number of candidate sequences, called the beam width, usually denoted as BBB.

At each generation step:

Every sequence in the beam is expanded by one token.
All expanded sequences are scored using their cumulative probability.
Only the top B sequences are kept.
The rest are discarded.

This process repeats ...

Token	Probability
Paris	0.72
Lyon	0.12
Marseille	0.07
London	0.03

1.Introduction to GenAI System Design

2.Fundamental Concepts in GenAI

Breakout Session

3.Back-of-the-envelope Calculations

4.Systematic Framework for Designing GenAI Systems

5.System Design of a Text-to-Text Generation System

Mock Interview

6.System Design of a Text-to-Image Generation System

Mock Interview

7.System Design of a Text-to-Speech Generation System

Mock Interview

8.System Design of a Text-to-Video Generation System

Mock Interview

9.System Design of an Image Captioning System

10.Conclusion

11.Free GenAI System Design Lessons

Sampling Strategies

Greedy decoding

Why greedy decoding is not enough

Controlled randomness

Beam search

The core idea behind beam search