Indexing Algorithms: PQ, LSH, and HNSW
Explore how indexing algorithms like Product Quantization (PQ), Locality-Sensitive Hashing (LSH), and Hierarchical Navigable Small World (HNSW) optimize vector search speed recall and memory use. Understand their mechanisms trade-offs and ideal use cases to build efficient vector search systems in production AI applications.
We'll cover the following...
The speed and recall of vector search in a production pipeline depend directly on one critical engineering decision: the indexing algorithm powering the approximate nearest neighbor (ANN) lookup. When a user query is embedded into a vector and sent to a vector database, the index structure determines how quickly and accurately the system finds the most relevant results. Brute-force search, which compares the query vector against every stored vector, delivers perfect accuracy but becomes computationally prohibitive at scale. Scanning millions of 768-dimensional float32 vectors one by one is simply not viable when your application needs sub-second response times.
Production systems solve this by using indexing algorithms that trade a small amount of accuracy for massive speed gains. Three algorithms dominate the landscape, each optimizing for a different axis. Product Quantization (PQ) compresses vectors to shrink memory consumption. Locality-Sensitive Hashing (LSH) uses hash-based bucketing for sub-linear query speed. Hierarchical Navigable Small World (HNSW) graphs build layered navigation structures that deliver the highest recall. Services like Amazon OpenSearch Service with its k-NN plugin expose these algorithms as configurable options, making the choice of index a practical engineering decision rather than a theoretical exercise.
This lesson examines each algorithm’s mechanics, strengths, and weaknesses, then compares them side by side so you can reason about the trade-offs when building your own vector search pipelines.
Product Quantization for memory compression
The memory problem
Consider a dataset of 100 million document embeddings, each with 768 dimensions stored as float32 values. A single vector consumes
Product Quantization (PQ) is a lossy compression technique that dramatically reduces this ...