Query Transformation and Multi-Query Retrieval

Explore query transformation techniques that enhance vector database retrieval by bridging the semantic gap between user queries and documents. Understand methods such as query rewriting, hypothetical document embeddings (HyDE), step-back prompting, and multi-query retrieval to improve search relevance and coverage. This lesson helps you evaluate trade-offs in latency and cost while optimizing retrieval for specific and ambiguous queries.

We'll cover the following...

Query rewriting with an LLM
- How rewriting works
Hypothetical document embeddings
- The HyDE process step by step
Step-back prompting for retrieval
- When specificity hurts retrieval
Multi-query retrieval
- Generating and merging query variants
  - Result merging strategies
Conclusion

Even with a well-tuned approximate nearest neighbor (ANN) index and the right distance metric, retrieval can still fail for a surprisingly simple reason: the query embedding itself does not capture what the user actually means. A user types a short, informal question, but the relevant answer lives inside a formal, technical paragraph that uses entirely different vocabulary. This mismatch between how people ask questions and how documents express answers is the core bottleneck that query transformation techniques are designed to solve.

Consider a concrete example. A user asks: Why is my Lambda timing out? The indexed document chunk, however, reads: Execution duration exceeds the configured timeout threshold in AWS Lambda. These two sentences describe the same problem, but they share almost no words. When each is converted into an embedding vector, the resulting points in vector space may not be close enough for a top-k search to surface the document. This vocabulary mismatch between queries and documents is called the semantic gap.

Note: The semantic gap is not a failure of the embedding model. It reflects a fundamental asymmetry: questions are short and colloquial, while documents are long and precise. Bridging this gap requires changing the query before it reaches the index.

Query transformation techniques address this bottleneck by reformulating the user’s query before it ever hits the vector index. The rest of this lesson walks through four such techniques: query rewriting, hypothetical document embeddings (HyDE), step-back prompting, and multi-query retrieval. Each operates upstream of the search step, improving what goes into the index lookup rather than filtering what comes out.

Query rewriting with an LLM

Query rewriting is the simplest transformation in this family. An LLM rephrases the user’s original query into a more precise, search-friendly form before that query is embedded and sent to the vector index.

How rewriting works

The mechanics are straightforward. The original query is sent to a language model (such as GPT-4 or Claude) along with a ...

1.LLM Application Architectures

2.Challenges and Risks

3.Transformers and Attention

4.Vector Databases

5.Prompt Engineering

Cloud Lab

6.Fine-Tuning

Cloud Lab

7.Model Context with LangChain

8.Agentic Workflows

Cloud Lab

9.Retrieval Augmented Generation (RAG)

Cloud Lab

Cloud Lab

10.LLM Evaluation

Cloud Lab

Query Transformation and Multi-Query Retrieval

Query rewriting with an LLM

How rewriting works