A practitioner's guide to building scalable systems

Building Scalable Systems: Distributed Systems basics to System Design Interviews

We want to give you the tools to compare multiple approaches and pick the one that works for your scenario.

Grokking the Distributed and Scalable Systems

# What Individual Request Latencies tell us 

Let's look at the following percentiles calculated by looking at 100 requests

|  P50   |   P75  |  P90   |  P95   |   P99  | P999     | P9999 |
| --- | --- | --- | --- | --- | --- | --- |
|  1   |  4   |  4   |  5   |  9.9   |  27.9   | 29.8|

If you are looking at this data from individual requests made to the server, you can conclude the following

1. 95% of the users are served within 5 seconds
2. 99% of the users are served in less than 10 seconds
3. 99.9% are served within 28 seconds.
4. 99.99% are served with 29 seconds.

Now 28 seconds seems really high but you know that this is only experienced by 1 in a thousand users.

# Individual Latencies can mislead
There is a problem with the above assumption. When you load a website, the client sometimes makes 100s of requests to load the page. Loading the Facebook home page could result in 10K+ queries to the backend if there's no caching. Thankfully, there is

For simplicity, let's assume that your web app makes 100 requests to load the webpage. It means that each customer has 100 times more chance of hitting the tail latency of 28 seconds than you individually thought. Therefore, we can conclude

1. Every 1 in 10 customers will observe the latency of 28 seconds
2. Every 1 in 100 customers will experience the latency of ~30 seconds or more.

Now, the performance isn't looking spectacularly great if 10% of your customers experience such high latency.

# How to measure Latency
It's best to measure latency at the client-side and not at the individual request level e.g. measuring Time to Interactive for the browser-based applications will give you a truer picture of latencies experienced by the clients. Your Time to Interactive might involve 5 requests or 100 requests - but you would have a sense of the responsiveness that your customers typically experience while visiting your website.

# Tail Latency at Scale
When you are building an application for 100s of Millions or Billions of Users, your tail latencies also become more important e.g. in the above example, 1 in 100 customers were experiencing a 30-second delay. Now if you have 100MM customers, 100K customers are experiencing 30-seconds delays. That's a lot of customers who are pissed off at your product.

# Tail Latencies can affect your most loyal users
Tail latencies are also important to look at because they are more likely to affect your most frequent users. Here are a few example scenarios

## Amazon Order Page
Amazon might be measuring the latency of its orders page.



What and Why

Node in a Distributed System

Performance and Numbers

Fault Tolerance and Availability

Understanding TIME in Distributed Systems

Data Centers for software engineers

Scaling Distributed Systems

Load Balancers

Structured Storage at Scale

Sharding in Storage Systems

SQL vs NO-SQL

Blob Storage at Scale

Caching in Distributed Systems

Queues and Asynchronous Processing

Data Analytics and Batch Processing

Logging and Monitoring

Searching at Scale

Conclusion

Tail Latencies at Scale: When Percentiles mislead

What Individual Request Latencies tell us