Introduction to the rate limiter

When an API becomes available to the public, there can be an influx of users and services. Anyone can use it at any time and as much as they want, potentially preventing other legitimate users. Moreover, the API providers have limited resources per unit of time provisioned, and they want their services to be fairly available to all of their customers. It’s good to have our APIs used by many people, but untethered access has drawbacks. Too many requests can overwhelm API gateways; therefore, API owners enforce a limit on the number of requests or the amount of clients' data from clients. This constraint on the number of requests or usage is carried out via a component called the API rate limiter.

The API rate limiter throttles clients' requests that exceed the predefined limit in a unit time instead of disconnecting them. Throttling refers to controlling the flow by discarding some of the requests. It can also be considered a security feature to prevent bot and DoS attacks that can overwhelm a server by sending a burst of requests. Overall, rate limiting provides a protective layer when a large number of requests per unit of time (a spike or thundering herdsIn computer science, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again. [Wikipedia]) is directed via an API.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.