Introduction to Response Time in APIs

Get to know the different time factors affecting API performance.


Most modern applications are data oriented. These applications process data and present it to the users in user-friendly formats. Especially when we talk about dynamic applications, the data continuously updates. A server stores and serves the continuously updating information whenever requested by connected devices or clients. We concern ourselves primarily with the Internet in this chapter because that is a common way for customers to request services via APIs.

At the API design level, we must establish API SLAs that are realistically achievable using current technology and our cost budget. For example, for voice calls over the Internet, one-way latency of more than 100 ms will start deteriorating the listener’s experience. So, in this case, we (as API and back-end designers) would have some threshold to target for. Now, we need to carefully see, from end to end (from client to the service), how we’ll design to meet the goal (latency in the case of voice over the internet) and, if it’s not possible, how we’ll mitigate it.

Over the years, major services like Google Search and others have set high expectations for customers in general. API designers can’t ignore such customer expectations, or their app might fail because no one wants to use a slow app. The following questions, if answered properly, result in an effective customer experience:

  • How quickly is the API acting on requests and sending responses back?

  • How does the increasing number of requests affect the performance of an API?

Depending on the required operations, different APIs may have varying latencies. These APIs access different types of memory to save or retrieve information, which also takes time. We’ll take help from the standard numbers given in the table below to derive our calculations.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.