Evaluation of Distributed Search Design

Analyze how our design meets the requirements.

Availability

We utilized distributed storage to store:

  • Documents crawled by the indexer
  • Inverted indexes generated by the indexing nodes

Data is replicated across multiple regions in distributed storage, making cross-region deployment for indexing and search easier. The group of indexing and search nodes merely needs to be replicated in different availability zones. Therefore, we deploy the cluster of indexing and search nodes in different availability zones so that if a failure occurs in one place, we can process the requests from another cluster. Multiple groups of indexing and search nodes help to achieve high indexing and search availability. Moreover, in each cluster, if a node dies, another can take its place.

The indexing is done offline (not on the user’s critical path). We don’t need to replicate the indexing operations synchronously as it is not necessary to respond to the user search queries with the latest data that is just added to the index. So, we don’t have to wait for the replication of the new index to respond to the search queries. This makes the search available to the users.

Once we have replicated the latest data at all the places and the search nodes have downloaded it, then the search queries are performed on the latest data.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy