How to estimate requests per second of a server

Requests per second (RPS) refers to the number of requests a server can handle in one second. RPS is a fundamental metric in back-of-the-envelope calculationsBack-of-the-envelope calculations (BOTECs) are approximate calculations performed through simplified assumptions using math. These are rough calculations and might not be precise, but they can be a valuable tool to estimate resources in the early stages of designing a system., providing valuable insights into system capacity, resource allocation, scalability, and performance optimization. We estimated the time it takes to process a single clock cycle considering a CPU with a frequency of 3.5 GHz, that is $0.286\ ns$ . We’ll use this number to estimate the RPS of different servers in this answer.

Types of requests

Within a server, there are limited resources; depending on the type of client requests, different resources can become a bottleneck. Let’s understand three types of requests.

CPU-bound requests: These are requests where the limiting factor is the CPU. The CPU performance is bounded by computationally extensive requests.
Memory-bound requests: These requests are limited by the amount of memory a machine has. The machine’s performance depends on the memory access speed, and it can take more clock cycles to process one byte of data.
I/O bound requests: These are the requests that depend on input or output from other local or remote devices, such as querying databases, requesting data from other application servers, etc.

The following terms are used in this calculation:

$RPS_{memory}$ : The memory-bound request per seconds
$RAM_{size}$ : The total size of the RAM
$Worker_{memory}$ : A worker in memory that manages a request. These are processes or threads responsible for executing or managing task completion.

It takes $16\ clock\ cycles$ to read or write 1 data byte to/from memory. Let’s suppose each request contains $100\ KB$ of data.

Note: The RPS can vary depending on the total memory of the server and the memory each worker takes as well as the number of workers to handle a single request. Moreover, the time to perform an operation (read or write) can also vary according to data size.

I/O bound RPS

I/O bound request, as stated earlier, depends on I/O devices to perform the operations. Now, depending on the location (within or outside of a data center, zone, or regionA region is referred to as a geographical location, a zone is an isolated location within a region, and a data center is the physical existence of resources in a zone. A region can have multiple zones, and a zone can have multiple data centers in it.) of those devices, the time taken to execute an operation can vary. The change in execution time for a simple request for different scenarios is shown in the following slides:

Note: The I/O-bound RPS of a server can vary depending on the number and location of resources it needs to communicate with.

Cache RPS

Most of the time, user requests are repetitive and are responded to with the cached data from the server instead of accessing resources. It saves time for the server to respond to more user requests. The cache latency is approximately $4\ clock\ cycles$ if the cache line size is 2 bytes (It takes $4\ clock\ cycles$ to read 2 bytes of data from the cache or $2\ clock\ cycles$ per byte).

Estimating servers

These estimations of RPS are just for the understanding of a learner. In general, a request is a combination of all these bounds. We can derive formulas to calculate the total RPS of a server and the number of servers required to handle a certain number of requests in the following:

Suppose there are $K$ total requests per second to be handled by a server. Out of that $K$ , the server’s cache handles 20% of the requests and responds directly.
The server handles the remaining requests with different bounds. Let’s say 20% of the requests are CPU bound, 20% are memory bound, 20% are I/O bound, and the remaining 20% are hybrid requestsA request that requires all the components to work together to complete it..