Requests per second (RPS) refers to the number of requests a server can handle in one second. RPS is a fundamental metric in
Within a server, there are limited resources; depending on the type of client requests, different resources can become a bottleneck. Let’s understand three types of requests.
CPU-bound requests: These are requests where the limiting factor is the CPU. The CPU performance is bounded by computationally extensive requests.
Memory-bound requests: These requests are limited by the amount of memory a machine has. The machine’s performance depends on the memory access speed, and it can take more clock cycles to process one byte of data.
I/O bound requests: These are the requests that depend on input or output from other local or remote devices, such as querying databases, requesting data from other application servers, etc.
Let’s estimate the requests per second a server can handle for each request type.
A CPU-bound request requires extensive calculations to be done by the CPU to complete the request. Such requests take millions of CPU clock cycles, such as matrix multiplication, cryptography, data compression, etc. The RPS for such requests is calculated as follows:
The following terms are used in this calculation:
Suppose an instruction takes 8 million clock cycles, then for a system with 36 cores and 3.5 GHz frequency, the RPS is calculated as follows:
Note: The time to process a request can vary depending on the nature of request and CPU's architecture (number of cores).
For memory-bound requests, the limiting factor is memory, and clock cycles to operate depend on the memory access speed and hence can be higher. Reading or writing a number of 1 byte to/from memory takes
The following terms are used in this calculation:
It takes
In that case, the time to handle a single request is as follows:
Suppose a system with 32 GB of memory, where each worker takes approximately 1 GB to manage requests. Multiple workers combine to perform a single operation. In our case, let’s suppose two workers handle a single memory-bound request. The RPS of the memory-bound is:
Note: The RPS can vary depending on the total memory of the server and the memory each worker takes as well as the number of workers to handle a single request. Moreover, the time to perform an operation (read or write) can also vary according to data size.
I/O bound request, as stated earlier, depends on I/O devices to perform the operations. Now, depending on the location (within or outside of a
Note: In general, a query takes approximately
to execute on MySQL server. We considered for the use case.
Considering the time from slides, the RPS for I/O bound requests can be estimated as follows:
So, if a server directly communicates with a database within a data center, the RPS is:
If a server communicates with a database through another server within a data center, the RPS is:
Note: The I/O-bound RPS of a server can vary depending on the number and location of resources it needs to communicate with.
Most of the time, user requests are repetitive and are responded to with the cached data from the server instead of accessing resources. It saves time for the server to respond to more user requests. The cache latency is approximately
So, to read a response of
The RPS of cached requests is as follows:
We can estimate requests per second of a server that tackles hybrid requests and takes
We can estimate the number of servers required to process K requests per second. For
Therefore, approximately 45 servers would be required to handle 12,000 requests per second, each with
Request Type | RPS of server |
CPU requests | 15500 |
Memory requests | 32000 |
I/O requests | 2500–4000 |
Cache requests | 17500 |
Hybrid requests | 267–9612 |
These estimations of RPS are just for the understanding of a learner. In general, a request is a combination of all these bounds. We can derive formulas to calculate the total RPS of a server and the number of servers required to handle a certain number of requests in the following:
Suppose there are
The server handles the remaining requests with different bounds. Let’s say 20% of the requests are CPU bound, 20% are memory bound, 20% are I/O bound, and the remaining 20% are
Let’s calculate the total requests of different types as follows:
We’ll use equations 1–5 to calculate the number of required servers to handle requests for an application using the following equation:
The percentage of each bound can vary depending on the use case or nature of the requests. Moreover, the RPS of CPU, memory, I/O, or cache can also vary depending on the clock cycles a request takes for completion.