Complex Retrieval: Hybrid Search and Hierarchical Indexing
Explore hybrid search methods that merge dense semantic and sparse lexical retrieval to improve document ranking. Understand hierarchical indexing strategies to balance retrieval precision with rich context and discover Hypothetical Document Embeddings (HyDE) to bridge stylistic gaps in queries. This lesson equips you to choose and apply advanced retrieval techniques tailored to diverse query and corpus characteristics, preparing you to build robust retrieval augmented generation systems.
We'll cover the following...
In the previous lesson, you learned how query transformation techniques like multi-query and step-back prompting can reshape a user’s question before it reaches the retrieval system. But what happens when the retrieval mechanism itself has gaps? Consider an enterprise knowledge base where a support engineer types: “How do I resolve error ERR-4092 when the authentication module fails silently?” This query blends natural language (“fails silently”) with an exact identifier (“ERR-4092”). A dense vector retriever understands the semantic meaning of “fails silently” but may rank documents containing the exact error code lower than vaguely related passages. A sparse lexical retriever like BM25 nails the error code match but misses documents that describe the same failure using different words. Neither retriever alone surfaces the best answer.
This gap motivates the three retrieval enhancements covered in this lesson. Hybrid search fuses dense and sparse retrieval to eliminate the gaps of either method. Hierarchical indexing decouples the granularity at which the system searches from the context it hands to the LLM. Hypothetical Document Embeddings (HyDE) transforms short queries into document-like passages before embedding, closing the stylistic mismatch between questions and indexed chunks.
Hybrid search with dense and BM25
Hybrid search runs two retrieval methods in parallel against the same corpus and merges their results into a single ranked list. The first method is
How the two-stage process works
When a query arrives, it follows two parallel paths. The dense path encodes the query through an embedding model and searches a k-NN (k-nearest neighbors) vector index. The sparse path tokenizes the query and searches an inverted index using BM25 scoring. Each path produces ...