...

>

Scaling Search and Indexing

Scaling Search and Indexing

Explore how to resolve scalability and resource wastage issues caused by colocating indexing and searching. Implement a highly performant distributed search System Design by separating these roles and using distributed storage. Learn to apply the MapReduce framework to parallelize index generation efficiently.

Problems with the proposed design

While the design from the previous lesson is functional, it has significant drawbacks regarding resource usage and scalability:

  1. Colocated indexing and searching: Running both operations on the same node causes resource contention. Since both indexing and searching are resource-intensive, they degrade each other’s performance. This design also prevents independent scaling of search and indexing resources based on load.

  2. Index recomputation: Computing the index independently on every replica wastes CPU. Index construction is a heavy pipeline involving hundreds of operations. Recomputing the same index on multiple machines is inefficient.

To address these issues, we need an alternative approach that decouples these operations.

Solution

Instead of recomputing the index on every replica, the system computes the inverted index once on the primary node. The resulting index file is then distributed to the replicas. This approach reduces CPU and memory usage by avoiding ...