Design Twitter
Learn the system design of Twitter.
Overview
Twitter is a free microblogging social network with 397 million users as of 2021. One of the main reasons for Twitter's popularity is the vast sharing of breaking news on the platform.
In this section, we will describe the functional and non-functional requirements of Twitter. Then we'll look deeper into its API design and identify the different components of the Twitter architecture. In addition, we will discuss how to manage the Top-k problem, such as Tweets liked or viewed by millions of users on Twitter. In the end, we will also explain how Twitter performs load balancing for its microservices system to manage billions of requests between various services’ instances.
Requirements
Let’s understand the functional and non-functional requirements below:
Functional requirements
Registered users can post and delete one or more Tweets on Twitter.
They can like, dislike, and reply to Tweets.
Users can search tweets by using keywords, hashtags, or usernames.
Users can follow or unfollow other users.
Users can view other users' timelines with their tweets or their own home timelines with the tweets of users they follow.
Non-functional requirements
Availability: The service Twitter provides needs to be highly available.
Latency: The latency of the distribution of tweets to followers must be low.
Scalability: Twitter's workload is read-heavy with around a 1:1000 read-to-write ratio thus high storage capacity is needed to store and deliver Tweets posted by public figures to their millions of followers.
Reliability: Twitter needs to be highly reliable, no content uploaded should get deleted or damaged.
Consistency: An effective technique is needed to offer rapid feedback to the user (who liked someone’s post), then to other specified users in the same region, and finally to all worldwide users linked to the Tweet.
Design and building blocks
- DNS is the service that maps human-friendly Twitter domain names to machine-readable IP addresses.
- Load balancers distribute the read/write requests among the respective services.
- Sequencers generate the unique IDs for the Tweets.
- Databases store the metadata of Tweets and users.
- Blob stores store the images and video clips attached with the Tweets.
- Key-value stores are used for multiple purposes such as indexing, identifying the specified counter to update its value, identifying the Tweets of a particular user and many more.
- Pub-sub is used for real-time processing such as elimination of redundant data, organizing data, and much more.
- Sharded counters help to handle the count of multiple features such as viewing, liking, Retweeting, and so on,. of the accounts with millions of followers.
- A cache is used to store the most requested and recent data in RAM to give users a quick response.
- CDN helps end users to access the data with low latency.
- Monitoring analyses all outgoing and incoming traffic, identifies the redundancy in the storage system, figures out the failed node, and so on.
High-level design
Let’s begin with the high-level design of our Twitter system.
Users post Tweets delivered to the server through the load balancer. Then, the system stores it in persistent storage.
DNS provides the specified IP address to the end user to start communication with the requested service.
CDN is ...