Search API Design Evaluation and Latency Budget

Learn the approaches to meet the non-functional requirements and the response time of the search API.


When designing an API, the optimization of one set of parameters may need rarefaction of the other set of parameters due to the tradeoffs between them. The preceding lessons have seen various aspects of modeling a search API. Mainly, we focused on accomplishing various functionalities of the search API. However, in this lesson, we focus on the non-functional aspects of the search API and how we meet them.

Non-functional requirements

The non-functional requirements are discussed below.


The availability of the search API is enhanced by utilizing rate limiting and API monitoring techniques that prevent our API and the back-end servers from choking. Similarly, to avoid cascaded failure in the internal services, we employ circuit breakers at various points that not only help in the availability of our API but also aid in its reliability.


The scalability of our API is increased by having redundant servers at the backend. So whenever one is down, the other would be on standby to handle the search queries. We also cache results to frequently searched queries. In addition to that, we make use of caching technologies between the client and our services to deliver static content. This reduces the burden on our servers, and consequently, we are able to handle a large number of queries.

Note: For more details on building scalable systems see the Grokking Modern System Design Interview for Engineers & Managers course.


We support TLS 1.2 and its newer versions to provide a secure communication channel for our APIs to exchange data between client and server. The security in search API can be provided in two ways:

  • A user without login: Since search is a public service, it’s possible to authenticate the requesting application (client) using the API key only.

  • A user with login: To provide a tailored response to users, it’s possible for end users to authenticate themselves using user credentials like username and password. Other than that, JWTsJSON Web Tokens can also be used to obtain a personalized experience from the search service.

Low latency

In order to reduce the latency of our search API, we have opted for a number of techniques. For instance, we utilize high-speed caches in the API gateway to keep the frequently searched queries that are generic and whose data is not updated instantly. Similarly, on the server side, we set a maximum threshold on time to generate results for each search query. If the search query takes more time than the threshold, the execution is halted, and the results found within the time limit are returned to the user. Furthermore, we employ pagination techniques, which reduce the network latency while fetching results in the form of a number of pages instead of retrieving all the searched results at once, which may exceed hundreds of pages. Also, performing the filtering before passing the results to the search server reduces the overall latency, as explained in the previous lesson.

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy