...

>

System Design: Newsfeed System

System Design: Newsfeed System

Design a scalable newsfeed system by defining requirements and performing resource estimation for billions of users. Architect the core components, including feed generation, publishing, and ranking services. Analyze the final System Design to ensure compliance with low latency, high availability, and scalability requirements.

What is a newsfeed?

A newsfeed of any social media platform (Twitter, Facebook, Instagram) is a list of stories generated by entitiesAn entity could be a page, group, friends, and followers of a user. that a user follows. It contains text, images, videos, and other activities such as likes, comments, shares, advertisements, and many more. This list is continuously updated and presented to relevant users on their home page. Similarly, a newsfeed system displays news to users from friends, followers, groups, and other pages, including a user’s own posts.

A newsfeed is a core feature of social platforms, aggregating recent posts and updates relevant to each user. It drives user engagement and repeat visits. These platforms operate at massive scale, serving billions of users. The engineering challenge is to deliver a personalized newsfeed in near real-time while maintaining scalability and high availability.

This lesson will discuss the high-level and detailed design of a newsfeed system for a social platform such as Facebook, Twitter, or Instagram.

Newsfeeds on a mobile application
Newsfeeds on a mobile application

Now that we understand what a newsfeed is and the challenges it presents, we will begin by defining the system’s requirements.

Requirements

To limit the scope of the problem, we’ll focus on the following functional and non-functional requirements:

Functional requirements

  • Newsfeed generation: The system will generate newsfeeds based on pages, groups, and followers that a user follows. A single user may follow or be connected to a large number of accounts. The system must aggregate candidate posts from all relevant connections. The primary challenge is the volume of candidate content. The system must filter and rank this content to determine which items are surfaced first.

  • Newsfeed contents: The newsfeed may contain text, images, and videos.

  • Newsfeed display: The system should affix new incoming posts to the newsfeed for all active users based on some ranking mechanism. Once ranked, we show content to a user with higher-ranked first.

Non-functional requirements

  • Scalability: Our proposed system should be highly scalable to support the ever-increasing number of users on any platform, such as Twitter, Facebook, and Instagram.

  • Fault tolerance: As the system should be handling a large amount of data, therefore, partition tolerance (system availability in the event of network failure between the system’s components) is necessary.

  • Availability: The service must be highly available to keep the users engaged with the platform. The system can compromise strong consistency for availability and fault tolerance, according to the PACELC theoremThe PACELC theorem is an extension of the CAP theorem that states, in the event of network Partition, one should choose between Availability or Consistency; else, choose between Latency and Consistency..

  • Low latency: The system should provide newsfeeds in real-time. Hence, the maximum latency should not be greater than 2 seconds.

These requirements, especially scalability, must be translated into concrete capacity targets. Capacity planning allows us to estimate expected traffic volume, storage footprint, and compute requirements.

Resource estimation

Assume the platform has 1 billion registered users, with an average of 500 million daily active users (DAU). Assume an average user has 300 friends and follows 250 pages. Using these assumptions, we can estimate traffic volume, storage footprint, and compute capacity.

Traffic estimation

Let’s assume that each daily active user opens the application (or social media page) 10 times a day. The total number of requests per day would be:

500 M×10=5500 \text{ M} \times 10 = 5 billions of requests per day 58 K\approx 58 \text{ K}requests per second.

Traffic estimation for the newsfeed system
Traffic estimation for the newsfeed system

Storage estimation

Let’s assume the feed will be generated offline and rendered on demand. Also, we’ll precompute the top 200 posts for each user. Let’s calculate storage estimates for users’ metadata, text posts, and media content.

  1. Users’ metadata storage estimation: Suppose the storage required for one user’s metadata is 50 KB50 \text{ KB}. For 1 billion users, we would need ...