Put Back-of-the-Envelope Numbers in Perspective
Define BOTECs and their role in simplifying complex System Design problems. Explore foundational numbers, including server types, latencies, and request classifications (CPU, memory, IO-bound). Learn to abstract real-world complexity to quickly estimate critical metrics, such as requests per second (RPS).
Back-of-the-envelope calculations (BOTECs) are quick, approximate estimations typically performed on paper. While not intended to yield precise results, they provide a preliminary evaluation of a system’s feasibility and critical parameters.
For example, to estimate a neighborhood’s population, you might count houses in a sample area and multiply by the average household size. This is faster than conducting a full census. Similar calculations can validate data or test assumptions.
BOTECs in system design
Modern systems consist of many interconnected computational components. Although architectures vary, such as monolithic and microservice architectures, accounting for every node type, load balancers, caches, and databases is impractical in a design interview. Back-of-the-envelope calculations (BOTECs) help estimate capacity and resource requirements without modeling every infrastructure component.
Common estimations include:
Concurrent TCP connections a server can support.
Requests per second (RPS) a web, database, or cache server can handle.
Storage requirements for a service.
We use BOTECs to abstract hardware specifics, latencies, and throughput rates. We will first examine server types and latencies to understand the system’s reality. Then, we will simplify these details to estimate RPS, bandwidth, and storage capacity.
Types of data center servers
Data centers use commodity hardware to scale cost-effectively. Below, we discuss common server types used to handle different workloads:
Web servers
Web servers are the first point of contact after load balancers and typically handle API calls. They are decoupled from application servers for scalability. While memory and storage requirements are often moderate, web servers need strong processing power. For example, Facebook has used web servers with 32 GB RAM and 500 GB storage.
Application servers
Application servers execute business logic and generate dynamic content. They often require significant computational and storage resources. Facebook has deployed application servers with 256 GB RAM and 6.5 TB of hybrid storage (flash and rotating disk).
Storage servers
As data grows, services use specialized storage units. YouTube, for example, uses:
Blob storage: For encoded videos
Temporary processing queue storage: Holds daily video uploads pending processing
Bigtable: Specialized storage for video thumbnails
RDBMS: For metadata (comments, likes, user channels)
Other systems, like Hadoop’s HDFS, are used for analytics. Storage servers manage both structured (SQL) and unstructured (NoSQL) data.
Returning to the example of Facebook, they’ve used servers with a storage capacity of up to 120 TB. With the number of servers in use, Facebook can store exabytes of data. (Note: One exabyte is
Note: Data centers also require servers for configuration, monitoring, load balancing, analytics, accounting, and caching.
We need a reference point to ground our calculations. In the table below, we depict the capabilities of a typical
Typical Server Specifications
Component | Count |
Processor | Intel Xeon (Sapphire Rapids 8488C) |
Number of cores | 64 cores |
RAM | 256 GB |
Cache (L3) | 112.5 MB |
Storage capacity | 16 TB |
Standard numbers to remember
Effective planning requires understanding the workloads machines can handle. Latency is a key factor in resource estimation. The table below outlines important numbers for system designers.
Important Latencies
Component | Time (nanoseconds) |
L1 cache reference | 0.9 |
L2 cache reference | 2.8 |
L3 cache reference | 12.9 |
Main memory reference | 100 |
Compress 1KB with Snzip | 3,000 (3 microseconds) |
Read 1 MB sequentially from memory | 9,000 (9 microseconds) |
Read 1 MB sequentially from SSD | 200,000 (200 microseconds) |
Round trip within same datacenter | 500,000 (500 microseconds) |
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD | 1,000,000 (1 milliseconds) |
Disk seek | 4,000,000 (4 milliseconds) |
Read 1 MB sequentially from disk | 2,000,000 (2 milliseconds) |
Send packet SF ⇄ NYC (round trip) | 71,000,000 (71 milliseconds) |
Focus on the
Compression time is relatively consistent because the data usually fits within the processor’s L1, L2, or L3 caches. For instance, a typical server has an L3 cache of around 45 MB. Data fitting within this limit avoids the latency of fetching from slower memory or storage.
In addition to latency, throughput is measured as the number of queries per second (QPS) that a single server can handle.
Important Rates
Queries | Time |
QPS handled by MySQL | 1000 |
QPS handled by key-value store | 10,000 |
QPS handled by cache server | 100,000–1 M |
The numbers above are approximations. Real performance varies based on query type (e.g., point query vs. range query), machine specifications, database design, indexing, and server load.
Note: Initial designs rely on BOTECs. As designs evolve, we use reference numbers from synthetic workloads (e.g.,
for database transactions) to validate assumptions. Finally, built-in monitoring helps identify bottlenecks and plan capacity. TPC-C Transaction Processing Performance Council Benchmark C (TPC-C) is a benchmark to compare the performance of online transaction processing systems.
With reference to the throughput numbers given above, what will be your reply if an interviewer says that they think that for a MySQL database, the average count of queries per second handled is 2000?
Request types
While we often estimate generic “requests,” real workloads fall into three categories: CPU-bound, memory-bound, and IO-bound.
CPU-bound requests: Limited by processing speed. Example: Compressing 1 KB of data. In our table, this takes 3 microseconds.
Memory-bound requests: Limited by the memory subsystem. For example, reading 1 MB from RAM sequentially. In our table, this takes 9 microseconds (3x slower than CPU-bound).
IO-bound requests: Limited by the IO subsystem (disk or network). For example, reading 1 MB sequentially from disk. In our table, this takes 200 microseconds (~66x slower than CPU-bound).
To simplify calculations, we often approximate these differences as orders of magnitude: if a CPU task takes
Abstracting away real system complexities
We have seen that real systems are complex. Considering every variable during a time-limited interview is impractical.
BOTECs are valuable for making high-level estimates and decisions early in the design process. Moving forward, we will focus on performing these calculations efficiently.
Request estimation in system design
We can estimate the number of requests a typical server can handle by calculating the CPU time required per request. A real request touches many nodes, but we will accumulate the work for estimation.
The following equation calculates the CPU time to execute a program (request). For simplicity, we assume:
(clock cycles per instruction) is 1. The processor clock rate is
(3.5 billion cycles per second). An average request requires
instructions.
Dimensional analysis confirms the result is in seconds:
: Instruction count (unitless). : Cycles per instruction (unitless). : Time per clock cycle (seconds).
Multiplying these values gives the CPU time per program (request) in seconds.
First, we calculate the CPU time per clock cycle using the frequency
Putting all the values together, we get:
Changing assumptions, such as the number of instructions per request, will change the final estimate. Without precise measurements, these approximations are typically sufficient for high-level capacity planning. This approach avoids modeling detailed CPU, memory, and I/O constraints. This level of abstraction is central to BOTECs.
In the next lesson, we’ll use requests per second (RPS) to estimate related resources, including storage and network bandwidth.