Indexing Algorithms: PQ, LSH, and HNSW

Explore how indexing algorithms like Product Quantization (PQ), Locality-Sensitive Hashing (LSH), and Hierarchical Navigable Small World (HNSW) optimize vector search speed recall and memory use. Understand their mechanisms trade-offs and ideal use cases to build efficient vector search systems in production AI applications.

We'll cover the following...

Product Quantization for memory compression
- The memory problem
- How PQ works
Locality-Sensitive Hashing for fast approximate search
- The hashing mechanism
HNSW graphs for high-recall ANN search
- The multi-layer graph structure
  - Query traversal
  - Key parameters
Comparing PQ, LSH, and HNSW
Conclusion

The speed and recall of vector search in a production pipeline depend directly on one critical engineering decision: the indexing algorithm powering the approximate nearest neighbor (ANN) lookup. When a user query is embedded into a vector and sent to a vector database, the index structure determines how quickly and accurately the system finds the most relevant results. Brute-force search, which compares the query vector against every stored vector, delivers perfect accuracy but becomes computationally prohibitive at scale. Scanning millions of 768-dimensional float32 vectors one by one is simply not viable when your application needs sub-second response times.

Production systems solve this by using indexing algorithms that trade a small amount of accuracy for massive speed gains. Three algorithms dominate the landscape, each optimizing for a different axis. Product Quantization (PQ) compresses vectors to shrink memory consumption. Locality-Sensitive Hashing (LSH) uses hash-based bucketing for sub-linear query speed. Hierarchical Navigable Small World (HNSW) graphs build layered navigation structures that deliver the highest recall. Services like Amazon OpenSearch Service with its k-NN plugin expose these algorithms as configurable options, making the choice of index a practical engineering decision rather than a theoretical exercise.

This lesson examines each algorithm’s mechanics, strengths, and weaknesses, then compares them side by side so you can reason about the trade-offs when building your own vector search pipelines.

Product Quantization for memory compression

The memory problem

Consider a dataset of 100 million document embeddings, each with 768 dimensions stored as float32 values. A single vector consumes $768 \times 4 = 3072$ bytes. Multiply that by 100 million, and you need roughly 286 GB of RAM just to hold the raw vectors. This is before accounting for any index structures. Most production machines cannot accommodate this, and scaling horizontally adds cost and complexity.

Product Quantization (PQ) is a lossy compression technique that dramatically reduces this ...

1.LLM Application Architectures

2.Challenges and Risks

3.Transformers and Attention

4.Vector Databases

5.Prompt Engineering

Cloud Lab

6.Fine-Tuning

Cloud Lab

7.Model Context with LangChain

8.Agentic Workflows

Cloud Lab

9.Retrieval Augmented Generation (RAG)

Cloud Lab

Cloud Lab

10.LLM Evaluation

Cloud Lab

Indexing Algorithms: PQ, LSH, and HNSW

Product Quantization for memory compression

The memory problem