Design Considerations of Yelp
Learn about the different design aspects of the Yelp system.
Introduction
We discussed the design, building blocks, components, and the entire workflow of our system in the previous lesson, which brought up a number of interesting questions that now need answering. For example, what approach do we use to find the places, or how do we rank places based on their popularity? This lesson addresses important design concerns like the ones we just mentioned. The table given below summarizes the goals of this lesson.
Summary of the Lesson
Section/Sub-section | Purpose |
Searching | This process divides the world into segments to optimize the search, so that all nearby sites in a given location and radius can be identified. |
Storing Indexes | We index the places to improve query performance and estimate the storage required for all of the indexes. |
Searching Using Segments | We search all the desired places by combining the segments. |
Dynamic Segments | We solve the static segment limitations using dynamic segments. |
Searching Using a QuadTree | We explore the searching functionality using a QuadTree. |
Space Estimations for a QuadTree | We estimate the storage required for a QuadTree. |
Data Partitioning | We look into the options we can use to partition data. |
Ensuring Availability | We look into how we’ll ensure the availability of the system. |
Inserting a New Place | We look into how we’ll insert the place in a QuadTree. |
Ranking Popular Places | We rank the places of the system. |
Evaluation | We evaluate how our system fulfills the non-functional requirements. |
Searching
From Google Maps, we were able to connect segments and meet the scalability challenge to process a large graph efficiently. The graph of the world had numerous nodes and vertices, and traversing them was time-consuming. Therefore, we divided the whole world into smaller segments/subgraphs to process and query them simultaneously. The segmentation helps us improve the scalability of the system.
Each segment will be of the size
How will we find nearby places if we create a table for storing all places?
We can store all the places in a table and uniquely identify a segment by having a segment_ID
. We can index each segment in the database. Now, we have limited the number of segments we need to search, so the query will be optimized and return results quickly.
Improve data fetching
We use a key-value store for quick access to places in the segments. The key is the segment_ID
, while the value
contains the list of places in that segment. Let’s estimate how much space we need to store the indexes.
We can usually store a few MBs in the value of the key-value store. If we assume that each segment has 500 places then it will take up
, and we can easily store it as a value.
The total area of the earth is around 200 million square miles, and the land surface area is about 60 million square miles. If we consider our radius to be ten miles, then we’ll have 6 million segments. An 8-Byte identifier will identify each segment.
Let’s calculate how much memory we need.
Total Area of the Earth (Million Square Miles) | 60 |
Search Radius (Miles) | 10 |
Number of Segments (Millions) | f6 |
Segment_ID (Bytes) | 8 |
Place_ID (Bytes) | 8 |
Number of Places (Millions) | 500 |
Total Space (GB) | f4.048 |
Search using segments
A user may select a radius for searching places that aren’t present in a single segment. So, we need to combine multiple segments by ...