4. Calculation & estimation

Assumptions

For the sake of simplicity, we can make these assumptions:

  • Video views per month are 150 billion.

  • 10% of videos watched are from recommendations, a total of 15 billion videos.

  • On the homepage, a user sees 100 video recommendations.

  • On average, a user watches two videos out of 100 video recommendations.

  • If users do not click or watch some video within a given time frame, i.e., 10 minutes, then it is a missed recommendation.

  • The total number of users is 1.3 billion.

Data size

  • For 1 month, we collected 15 billion positive labels and 750 billion negative labels.
  • Generally, we can assume that for every data point we collect, we also collect hundreds of features. For simplicity, each row takes 500 bytes to store. In one month, we need 800 billion rows.
  • Total size: 500 * 800 * 10910^9 = 4 * 101410^{14} bytes = 0.4 Petabytes. To save costs, we can keep the last six months or one year of data in the data lake, and archive old data in cold storage.

Bandwidth

  • Assume that every second we have to generate a recommendation request for 10 million users. Each request will generate ranks for 1k-10k videos.

Scale

  • Support 1.3 billion users

5. System design

High-level system design

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy