Requirements of WhatsApp Design [clone]

Understand and identify the functional and non-functional requirements for a chat application like WhatsApp.

Design problem

In today’s technological world, WhatsApp is an important messaging application that connects billions of people around the globe. Among many other important things, many users’ day starts with reading or sending WhatsApp messages to their significant ones. However, there are some questions about WhatsApp that come to mind, for example:

  • How is this application designed?
  • How does it work?
  • What are the different types of components involved in it?
  • How does WhatsApp enable billions of users to communicate with each other?
  • How does WhatsApp keep all that data secure?

In this chapter, we will focus on the high-level and detailed design of the WhatsApp application to answer the above questions. To limit the scope of the problem we will look into the following functional and non-functional requirements.

Requirements

Our design of the WhatsApp messenger should meet the following requirements.

Functional requirements

  • Conversation: The system should support one-on-one and group conversations between users.

  • Acknowledgment: The system should support message delivery acknowledgment, i.e., sent, delivered, and read.

  • Sharing: The system should support sharing of media files, i.e., images, videos, audio.

  • Chat storage: The system should support the persistent storage of chat (messages) when a user is offline until the successful delivery of messages.

  • Push notifications: The system should be able to notify offline users of new messages once their status becomes online.

Non-functional requirements

  • Low latency: Users should be able to receive messages with low latency.

  • Consistency: Messages should be delivered in the order as sent by the sender. Moreover, users should also see the same chat history on all of their devices.

  • Availability: The system should be highly available; however, the availability can be compromised in the interest of consistency.

  • Security: The system should be able to provide high security via end-to-end encryption. The end-to-end encryption ensures that only the two communicating parties can see the content of messages, and nobody in between, not even WhatsApp.

Capacity estimation

WhatsApp is the most used messaging application across the globe. According to WhatsApp, it supports more than 2 Billion users around the world who share more than 100 Billion messages each day. We need to estimate the storage capacity, bandwidth, and number of servers to support such an enormous number of users and messages.

Storage

As there are more than 100 Billion messages shared per day over WhatsApp, let’s estimate the storage capacity based on this figure. Assume that each message takes 100 bytes on average. Moreover, the WhatsApp servers keep the messages only for 30 days. So, if the user doesn’t get connected to the server within these days the messages will be permanently deleted from the server.

100 Billion/day∗100 bytes=10 TB/day100\ Billion/ day * 100\ bytes = 10\ TB/day

For 30 days this capacity would become:

30∗10 TB/day=300 TB/month30*10\ TB/day= 300\ TB/ month

Besides chat messages, we have ignored the media files, which take more than 100 bytes per message. Moreover, we also have to store users’ information and message’s metadata e.g., timestamp, ID, etc. Along the way, we also need encryption and decryption for secure communication; therefore, we would also need to store encryption keys and relevant metadata. So, to be precise, we need more than 300 TB per month, but for the sake of simplicity, let’s stick to the number 300 TB/month.

Bandwidth

According to the storage capacity estimation, our service will get 10TB of data each day, giving us a bandwidth of 926Mb/s.

10 TB/86400sec =926Mb/s10\ TB/86400 sec ~= 926Mb/s

For simplicity, we have ignored the media content (images, videos, documents, etc.); therefore, the number 926 might seems low.

Since each incoming message needs to go out to another user, we will also require an equal amount of outgoing bandwidth.

High Level Estimates

Type

Estimates

Total messages per day

100 Billion

Storage required per day

10 TB

Storage for 30 days

300 TB

Incoming data per second

926Mb/s

Outgoing data per second

926Mb/s

Servers

WhatsApp handles around 10 million connections on a single server, which seems quite high for a server; however, it is possible by extensive performance engineeringFor exact details see the YouTube talk titled C10M Defending The Internet at Scale by Robert Graham.. We will need to know all the nitty-gritty of a system such as a server’s kernel, networking library, infrastructure configuration, etc.

We can often optimize a general-purpose server for special tasks by careful performance engineering of the full software stack.

Let’s move to the estimation of the number of servers:

No. of servers=Total connections per day/No. of connections per server=2 Billion/10 Million=200 serversNo.\ of\ servers = Total\ connections\ per\ day/No.\ of\ connections\ per\ server = 2\ Billion/10\ Million = 200\ servers

So, according to the above estimates, we would require 200 chat servers.

Try yourself

Let’s analyze how the number of messages per day affects the storage and bandwidth requirements. For this purpose, you can change values in the following table to compute the estimates.

AB
1Number of users per day (in Billions)2
2Number of messages per day (in Billions)100
3Size of each message (in Bytes)100
4Number of connections a server can handle (in Millions)10
5Storage estimation per day (in TB)f10
6Incoming and Outgoing bandwidth (Mb/s)f926.4
7Number of chat servers requiredf200

Building blocks required

In the next lessons, we will be focusing on the high-level and detailed design of the WhatsApp messenger. The design will consist of many building blocks that have been discussed in the initial chapters. Before starting the next lesson, we assume that the learner would have gone through the following lessons related to building blocks.

  1. Load balancer
  2. Database
  3. Cache
  4. Messaging queue
  5. Blob storage
  6. CDN