Introduction

We discussed the design, building blocks, components, and the entire workflow of our system in the previous lesson, which brought up a number of interesting questions that now need answering. For example, what approach do we use to find the places, or how do we rank places based on their popularity? This lesson addresses important design concerns like the ones we just mentioned. The table given below summarizes the goals of this lesson.

Summary of the Lesson

Section/Sub-section	Purpose
Searching	This process divides the world into segments to optimize the search, so that all nearby sites in a given location and radius can be identified.
Storing Indexes	We index the places to improve query performance and estimate the storage required for all of the indexes.
Searching Using Segments	We search all the desired places by combining the segments.
Dynamic Segments	We solve the static segment limitations using dynamic segments.
Searching Using a QuadTree	We explore the searching functionality using a QuadTree.
Space Estimations for a QuadTree	We estimate the storage required for a QuadTree.
Data Partitioning	We look into the options we can use to partition data.
Ensuring Availability	We look into how we’ll ensure the availability of the system.
Inserting a New Place	We look into how we’ll insert the place in a QuadTree.
Ranking Popular Places	We rank the places of the system.
Evaluation	We evaluate how our system fulfills the non-functional requirements.

Searching

From Google Maps, we were able to connect segments and meet the scalability challenge to process a large graph efficiently. The graph of the world had numerous nodes and vertices, and traversing them was time-consuming. Therefore, we divided the whole world into smaller segments/subgraphs to process and query them simultaneously. The segmentation helps us improve the scalability of the system.

Each segment will be of the size $5 \times 5 \ \text{miles}$ and will contain a list of places that exist within it. We only search a few segments to locate destinations that are close by. We can use a given location and defined radius to find all the nearby segments and identify sites that are close.

We can store all the places in a table and uniquely identify a segment by having a segment_ID. We can index each segment in the database. Now, we have limited the number of segments we need to search, so the query will be optimized and return results quickly.

Improve data fetching

We use a key-value store for quick access to places in the segments. The key is the segment_ID, while the value contains the list of places in that segment. Let’s estimate how much space we need to store the indexes.

We can usually store a few MBs in the value of the key-value store. If we assume that each segment has 500 places then it will take up $500 \times 1296 \ \text{Bytes = 0.648MB}$ , and we can easily store it as a value.

The total area of the earth is around 200 million square miles, and the land surface area is about 60 million square miles. If we consider our radius to be ten miles, then we’ll have 6 million segments. An 8-Byte identifier will identify each segment.

Let’s calculate how much memory we need.

Total Area of the Earth (Million Square Miles)	60
Search Radius (Miles)	10
Number of Segments (Millions)	f6
Segment_ID (Bytes)	8
Place_ID (Bytes)	8
Number of Places (Millions)	500
Total Space (GB)	f4.048

Design Considerations of Yelp

Introduction

Summary of the Lesson

Searching

Improve data fetching

Search using segments