Query Transformation and Multi-Query Retrieval
Explore query transformation techniques that enhance vector database retrieval by bridging the semantic gap between user queries and documents. Understand methods such as query rewriting, hypothetical document embeddings (HyDE), step-back prompting, and multi-query retrieval to improve search relevance and coverage. This lesson helps you evaluate trade-offs in latency and cost while optimizing retrieval for specific and ambiguous queries.
Even with a well-tuned approximate nearest neighbor (ANN) index and the right distance metric, retrieval can still fail for a surprisingly simple reason: the query embedding itself does not capture what the user actually means. A user types a short, informal question, but the relevant answer lives inside a formal, technical paragraph that uses entirely different vocabulary. This mismatch between how people ask questions and how documents express answers is the core bottleneck that query transformation techniques are designed to solve.
Consider a concrete example. A user asks: Why is my Lambda timing out? The indexed document chunk, however, reads: Execution duration exceeds the configured timeout threshold in AWS Lambda. These two sentences describe the same problem, but they share almost no words. When each is converted into an embedding vector, the resulting points in vector space may not be close enough for a top-k search to surface the document. This vocabulary mismatch between queries and documents is called the semantic gap.
Note: The semantic gap is not a failure of the embedding model. It reflects a fundamental asymmetry: questions are short and colloquial, while documents are long and precise. Bridging this gap requires changing the query before it reaches the index.
Query transformation techniques address this bottleneck by reformulating the user’s query before it ever hits the vector index. The rest of this lesson walks through four such techniques: query rewriting, hypothetical document embeddings (HyDE), step-back prompting, and multi-query retrieval. Each operates upstream of the search step, improving what goes into the index lookup rather than filtering what comes out.
Query rewriting with an LLM
Query rewriting is the simplest transformation in this family. An LLM rephrases the user’s original query into a more precise, search-friendly form before that query is embedded and sent to the vector index.
How rewriting works
The mechanics are straightforward. The original query is sent to a language model (such as GPT-4 or Claude) along with a ...