Motivation

Most modern applications are data oriented. These applications process data and present it to the users in user-friendly formats. Especially when we talk about dynamic applications, the data continuously updates. A server stores and serves the continuously updating information whenever requested by connected devices or clients. We concern ourselves primarily with the Internet in this chapter because that is a common way for customers to request services via APIs.

At the API design level, we must establish API SLAs that are realistically achievable using current technology and our cost budget. For example, for voice calls over the Internet, one-way latency of more than 100 ms will start deteriorating the listener’s experience. So, in this case, we (as API and back-end designers) would have some threshold to target for. Now, we need to carefully see, from end to end (from client to the service), how we’ll design to meet the goal (latency in the case of voice over the internet) and, if it’s not possible, how we’ll mitigate it.

Over the years, major services like Google Search and others have set high expectations for customers in general. API designers can’t ignore such customer expectations, or their app might fail because no one wants to use a slow app. The following questions, if answered properly, result in an effective customer experience:

How quickly is the API acting on requests and sending responses back?
How does the increasing number of requests affect the performance of an API?

Depending on the required operations, different APIs may have varying latencies. These APIs access different types of memory to save or retrieve information, which also takes time. We’ll take help from the standard numbers given in the table below to derive our calculations.

Standard Latency Numbers

Operations	Time
CPU registers access time	0.5ns
L1 cache access	0.9ns
L2 cache access	2.8ns
L3 cache access	10ns–100ns
Reading 1 MB from memory	9μs
SSD write latency Round trip in the same data center takes around 500 μs	100μs–1000μs
Read 1MB sequentially from disk takes 2 ms Disk seek time is 4 ms Intra-zone network latency takes around 5 ms	1ms–10ms
The network round trip between two zones (inter-zone) Reading 1 GB of sequential data from memory on the same server	10ms–100ms
Password hashing algorithm TLS handshake takes 250 ms–500 ms The network round trip between the two regions Reading 1 GB data sequentially from SSD on the same server	100ms–1000ms
In 2023, a typical across-continent (zone-zone) latency is around 8 seconds for transferring 1 GB of data assuming a 1 Gbps network	>1s

Introduction to the Course

Network Intricacies

Different Ways of Client-Server Communication

Common Data Formats for Web APIs

Comparison of API Architectural Styles

API Design Security

Important Concepts in Product Architecture

Back-of-the-Envelope Calculations for Latency

What Are the Foundational API Designs?

Design a Search Service

Design a File Service

Design a Comment Service

Design a Pub-Sub Service

Concluding Foundational Design Problems

YouTube Streaming API Design

YouTube

Facebook Messenger API Design

Google Maps API Design

Google Maps

Learn to Design a Chess API with AI Mentor

Zoom API Design

Zoom

Leetcode API Design

LeetCode

Payment Gateway API Design—Stripe

Stripe

Twitter API Design

Uber API Design

Uber

CamelCamelCamel API Design

CamelCamelCamel (C3)

Gaming API Design

API Failures and Mitigations

Evernote

Conclusion

Introduction to Response Time in APIs

Motivation

Standard Latency Numbers