Operation time estimation in back-of-the-envelope calculations

Back-of-the-envelope calculations (BOTECs) are approximate calculations performed through simplified assumptions using math. These are rough calculations and might not be precise, but they can be a valuable tool to estimate resources in the early stages of a system design. BOTECs help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more important aspects. To build and run any software system, we need the following resources:

Servers
Network bandwidth
Storage

For example, a service will need the following approximate resources to seamlessly handle the users' requests:

These calculations are based on the assumption about the workload/requirements of the system and the capacity of the servers/machines to be used in the system. We must estimate the above resources in the design stage of the system as they help serve the following purposes:

Feasibility assessment: The rough estimation of key resources, such as servers, storage, bandwidth, etc., helps a user assess the system’s feasibility if it’s viable or needs to be updated.
Performance analysis: The BOTECs help a designer to analyze the system’s performance by considering different factors like data size, bandwidth, number of requests handled by a server, etc.
Cost estimation and resource planning: The system designers can estimate costs and plan for the resources to be arranged for the system after analyzing BOTECs.
Trade-off analysis: The system designers can properly change the configuration to balance cost, performance, and resource utilization.

Quick resource estimation and high-level design overview also help in effective system design by reiterating these estimations to meet certain requirements.

There are many different servers, and each one is designed for specific jobs. These servers can do different things depending on how powerful they are, like how fast they can process information, how much memory they have, and how much data they can store. To accurately estimate the resources a system requires, it is crucial to carefully select the appropriate server type that aligns with the application’s or service’s demands. In the subsequent sections, we will delve deeper into these practical considerations, particularly estimation of the operation time.

Operation time estimation

The time taken by the central processing unit (CPU) to process a request depends directly on the clock rate. The CPU’s frequency determines the clock rate—the speed at which the CPU’s clock generates pulses. Each request is related to performing rigorous calculations, executing a query, or writing/reading data to/from memory. The time to process such requests depends on the number of clock cycles (cc) it takes for completion. For example, it takes one clock cycle to add two numbers, $16\ clock\ cycles$ to write a number to memory, etc.

Note: We'll use conventions like bn for billions, ms for milliseconds, and ns for nanoseconds.

Let’s assume we have a CPU with a frequency of $3.5\ GHz$ (approx. 3.5 billion (bn) cycles/sec).

Though a real-time request might be related to combining multiple operations, we’ll begin by calculating clock cycles and then the time for specific operations to gain a better understanding. As we progress, we’ll gradually explore real-time examples.

Note: Instruction-level parallelism (ILP) is also possible in modern systems and can be achieved by techniques such as pipelining, superscalar execution, and out-of-order execution. Though ILP increases the instruction per clock cycle (IPC)IPC is a metric used to measuring the efficiency of a processor to execute instructions.Instructions per clock cycle (IPC) is a metric used to measuring the efficiency of a processor to execute instructions., in our case, we consider each clock cycle executes single instruction only.

Let’s estimate the time taken by different operations in the subsequent sections.

Encrypting data

Suppose a request needs the server to encrypt $10\ KB$ of data using an advanced encryption standard (AES). Supposing a common implementation of AES, the encryption process takes 10 to $15\ clock\ cycles$ per byte. In that case, to complete the encryption, it takes:

Based on the above assumptions, it takes $0.0045\ ms/KB$ to read or write data to or from the memory.

Querying a database

A database would take approximately $100,000\ clock\ cycles$ as a minimum to execute a simple query of $10\ KB$ , depending on the specific RDBMS implementation, system configuration, hardware infrastructure, and workload characteristics. It involves the following operations:

Parsing and optimizing queries
Retrieving necessary data to memory
Retrieving matching records and filtering it
Returning results

So, the time to execute a query is as follows:

Based on the above assumptions, it takes $0.03\ ms/KB$ to query a database.

Note: In real-world scenarios, the time taken to execute a database query can vary significantly based on various factors, such as the complexity of the query, hardware performance, database optimizations, and the size of the data being processed. It's important to benchmark and profile the system to get more accurate time measurements.

Estimating time for a hybrid request

So far, we have estimated the time to process requests with specific operations. Now, let’s consider a real-time scenario to evaluate the time and, hence, the capacity of the server to process such requests:

Let's estimate the time for a request with $100\ KB$ of data to be retrieved from the database and stored in memory after encrypting using advanced encryption standards (AES); we need to estimate time and processing capacity of the server.

We must consider the time it takes to query the data, encrypt it using the AES, and store it in the memory. From the above calculations, the time taken for each operation is:

Based on the above assumptions, it takes $3.75\ ms$ to process a hybrid request of $100\ KB$ . The above calculations are for ideal requests where servers and databases are placed within a data center, and the server handles all the requests. If a server needs to query a database through another server or servers within a zone, then the time to process a request varies greatly.

We estimated the time it takes to process a single clock cycle considering a CPU with a frequency of $3.5\ GHz$ , that is $0.286\ ns$ . We’ll use this number to estimate requests per second a server can handle for different request types, which can directly be applied based on the number of clock cycles required to complete an operation (varying from a few clock cycles to millions of clock cycles).

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Operation time estimation in back-of-the-envelope calculations

Operation time estimation

Encrypting data

Accessing memory

Querying a database

Estimating time for a hybrid request