Search⌘ K
AI Features

Beam Search

Explore the beam search algorithm, a decoding strategy that maintains multiple candidate sequences to improve text generation quality over greedy search. Understand its advantages, limitations like length bias and repetition, and learn how to implement and tune beam search for balanced, coherent AI-generated text.

Beam search keeps showing up in interviews because decoding is as important as model training. Companies like OpenAI, Anthropic, Meta, Hugging Face, and Mistral want engineers who understand that generating text isn’t just about the model—it’s about how you explore the model’s probability space. Ask an LLM for text and it produces a probability distribution for the next token at every step. How you turn those probabilities into actual words is decoding, and decoding quality directly affects coherence, diversity, and style.

An interviewer brings up beam search to check whether you can reason about model outputs, diagnose issues like repetitive loops or blandness, and tune generation strategies. Decoding is where engineering intuition shines: knowing why a model picks certain paths, how to correct it, and when to switch strategies.

Before we get to beam search, it helps to understand what “search” even means in the context of LLMs.

What is search in LLMs?

When an LLM generates text, it does so one token at a time, and at each step it outputs a probability distribution over all possible next tokens. Deciding which token to pick from that distribution is decoding, or “search,” because we are effectively searching for the best sequence the model can produce. You can imagine the generation process as a tree of possibilities: the root is the start of your sequence, each tree level represents the next token position, and each branch is a possible token. The job of a decoding strategy is to traverse this tree in a smart way to build a complete sequence.

The greedy search is a straightforward strategy, which takes the highest-probability token at each step. Greedy search is fast and easy—but it’s shortsighted. Picking the locally most likely token can lead to a suboptimal overall sequence​. For example, maybe the second-best token at one stage would lead to a much better sentence later. Greedy search “commits” to a path without considering alternatives, much like a person who always takes the seemingly best immediate move in a game, potentially missing a winning long-term strategy. Technically, greedy decoding doesn’t always produce the globally most likely sequence under the model’s probability distribution​. It often results in generic or repetitive text because the model chooses the obvious high-probability continuations​.

At the other extreme, one could attempt exhaustive search, evaluating every possible sequence to find the absolute top-probability one​. But this is utterly infeasible for language generation – if your vocabulary has 50,000 tokens and you generate even a 20-token sentence, the number of possible sequences is astronomically high (50,0002050,000^{20}). Exhaustive search is exponentially expensive. Thus, practical decoding requires a heuristic search that balances quality and computational cost. This is where beam search comes in—a clever algorithm that lies between greedy and exhaustive search.

Quick answer for interview: Search refers to how we choose the next tokens when generating text. Greedy search picks the best immediate option, exhaustive search is impossible, and beam search balances efficiency with exploration.

What is beam search?

Beam search is a popular decoding algorithm that improves greedy search by keeping multiple possibilities open at each step. The core idea is that instead of following just one path down the generation tree, we maintain KK parallel candidates (where KK is called the beam width). Instead of choosing the best next token at each generation step, we look at the top KK tokens and branch out. We temporarily explore ...