How to estimate requests per second of a server

Requests per second (RPS) refers to the number of requests a server can handle in one second. RPS is a fundamental metric in back-of-the-envelope calculationsBack-of-the-envelope calculations (BOTECs) are approximate calculations performed through simplified assumptions using math. These are rough calculations and might not be precise, but they can be a valuable tool to estimate resources in the early stages of designing a system., providing valuable insights into system capacity, resource allocation, scalability, and performance optimization. We estimated the time it takes to process a single clock cycle considering a CPU with a frequency of 3.5 GHz, that is 0.286 ns0.286\ ns. We’ll use this number to estimate the RPS of different servers in this answer.

Types of requests

Within a server, there are limited resources; depending on the type of client requests, different resources can become a bottleneck. Let’s understand three types of requests.

  • CPU-bound requests: These are requests where the limiting factor is the CPU. The CPU performance is bounded by computationally extensive requests.

  • Memory-bound requests: These requests are limited by the amount of memory a machine has. The machine’s performance depends on the memory access speed, and it can take more clock cycles to process one byte of data.

  • I/O bound requests: These are the requests that depend on input or output from other local or remote devices, such as querying databases, requesting data from other application servers, etc.

A depiction of resources handling their bounded requests
A depiction of resources handling their bounded requests

Estimating RPS for different request types

Let’s estimate the requests per second a server can handle for each request type.

CPU-bound RPS

A CPU-bound request requires extensive calculations to be done by the CPU to complete the request. Such requests take millions of CPU clock cycles, such as matrix multiplication, cryptography, data compression, etc. The RPS for such requests is calculated as follows:

The following terms are used in this calculation:

  • RPSCPURPS_{CPU}: The CPU bounded request per second.

  • NumCPUNum_{CPU}: The number of CPU cores (or workers to operate).

  • TasktimeTask_{time}: The time each task takes to complete.

Suppose an instruction takes 8 million clock cycles, then for a system with 36 cores and 3.5 GHz frequency, the RPS is calculated as follows:

Note: The time to process a request can vary depending on the nature of request and CPU's architecture (number of cores).

Memory-bound RPS

For memory-bound requests, the limiting factor is memory, and clock cycles to operate depend on the memory access speed and hence can be higher. Reading or writing a number of 1 byte to/from memory takes 16 clock cycles16\ clock\ cycles. So, the RPS for memory-bound requests is calculated as follows:

The following terms are used in this calculation:

  • RPSmemoryRPS_{memory}: The memory-bound request per seconds

  • RAMsizeRAM_{size}: The total size of the RAM

  • WorkermemoryWorker_{memory}: A worker in memory that manages a request. These are processes or threads responsible for executing or managing task completion.

It takes 16 clock cycles16\ clock\ cycles to read or write 1 data byte to/from memory. Let’s suppose each request contains 100 KB100\ KB of data.

In that case, the time to handle a single request is as follows:

Suppose a system with 32 GB of memory, where each worker takes approximately 1 GB to manage requests. Multiple workers combine to perform a single operation. In our case, let’s suppose two workers handle a single memory-bound request. The RPS of the memory-bound is:

Note: The RPS can vary depending on the total memory of the server and the memory each worker takes as well as the number of workers to handle a single request. Moreover, the time to perform an operation (read or write) can also vary according to data size.

I/O bound RPS

I/O bound request, as stated earlier, depends on I/O devices to perform the operations. Now, depending on the location (within or outside of a data center, zone, or regionA region is referred to as a geographical location, a zone is an isolated location within a region, and a data center is the physical existence of resources in a zone. A region can have multiple zones, and a zone can have multiple data centers in it.) of those devices, the time taken to execute an operation can vary. The change in execution time for a simple request for different scenarios is shown in the following slides:

A server communicating with database within a data center
A server communicating with database within a data center
1 of 4

Note: In general, a query takes approximately 0.03 to 3 ms0.03\ to\ 3\ ms to execute on MySQL server. We considered 1.5 ms1.5\ ms for the use case.

Considering the time from slides, the RPS for I/O bound requests can be estimated as follows:

So, if a server directly communicates with a database within a data center, the RPS is:

If a server communicates with a database through another server within a data center, the RPS is:

Note: The I/O-bound RPS of a server can vary depending on the number and location of resources it needs to communicate with.

Cache RPS

Most of the time, user requests are repetitive and are responded to with the cached data from the server instead of accessing resources. It saves time for the server to respond to more user requests. The cache latency is approximately 4 clock cycles4\ clock\ cycles if the cache line size is 2 bytes (It takes 4 clock cycles4\ clock\ cycles to read 2 bytes of data from the cache or 2 clock cycles2\ clock\ cycles per byte).

So, to read a response of 100 KB100\ KB of data from the cache, the RPS is calculated as follows:

The RPS of cached requests is as follows:

Hybrid RPS

We can estimate requests per second of a server that tackles hybrid requests and takes 3.75 ms3.75\ ms to process each request, as discussed here, as follows:

We can estimate the number of servers required to process K requests per second. For K=12000K=12000 the required number of servers is calculated as follows:

Therefore, approximately 45 servers would be required to handle 12,000 requests per second, each with 100 KB100\ KB data queried from the database and saved to memory after encrypting using a simple encryption function.

RPS of Varying Request Types

Request Type

RPS of server

CPU requests

15500

Memory requests

32000

I/O requests

2500–4000

Cache requests

17500

Hybrid requests

267–9612

Estimating servers

These estimations of RPS are just for the understanding of a learner. In general, a request is a combination of all these bounds. We can derive formulas to calculate the total RPS of a server and the number of servers required to handle a certain number of requests in the following:

  • Suppose there are KK total requests per second to be handled by a server. Out of that KK, the server’s cache handles 20% of the requests and responds directly.

  • The server handles the remaining requests with different bounds. Let’s say 20% of the requests are CPU bound, 20% are memory bound, 20% are I/O bound, and the remaining 20% are hybrid requestsA request that requires all the components to work together to complete it..

Let’s calculate the total requests of different types as follows:

We’ll use equations 1–5 to calculate the number of required servers to handle requests for an application using the following equation:

The percentage of each bound can vary depending on the use case or nature of the requests. Moreover, the RPS of CPU, memory, I/O, or cache can also vary depending on the clock cycles a request takes for completion.

Copyright ©2024 Educative, Inc. All rights reserved