Twitter API Design Evaluation and Latency Budget


After familiarizing ourselves with Twitter's services and their endpoints, we'll focus on the last two aspects of designing the API in this lesson, namely how to meet non-functional requirements and how to estimate the response time of our Twitter API. Moreover, we’ll also discuss some interesting scenarios related to timelines and try to optimize the service using different approaches.

Non-functional requirements

Let's discuss how we can achieve the non-functional requirements of the API for Twitter services.


We ensure the availability of the system even during unexpected spikes (for instance, a celebrity’s Tweet needs to be delivered to millions of followers in a timely manner) by having loosely coupled services run separate tasks concurrently and statelessly. An example of such loosely coupled services is the usage of the pub-sub service between the Tweet service and the timeline service. The pub-sub service decouples our two main services and queues multiple concurrent Tweets during peak hours. Furthermore, we use a monitoring system that helps us detect anomalies, such as overloading the service due to excessive requests. To prevent excessive requests, rate limiting helps us reduce network traffic by restricting users' access to the Twitter API for a certain period of time. For example, the user can post a maximum of 15 Tweets per minute.


We use circuit breakers to identify and recover from bad situations as quickly as possible for our services. Also, we eliminate the single point of failure by routing the request to any available replica service. Furthermore, we use the backend for the frontend (BFF)The Backend for Frontend pattern (BFF) is used to divide the layer of the API gateway into multiple API gateways, such as Desktop API gateway and Mobile API gateway. It handles requests from different types of clients. approach for our API gateway to make it reliable and available because our services are used by different clients (mobile and website). For example, if the Twitter website is down (which is very rare), the service of the mobile application will not be affected by this downtime because the BFF handles each frontend or client type independently.


The stateless nature of the HTTP allows us to divide the load of incoming requests across multiple servers. Next, we use a mixture of push and pull models for the timeline updates, depending on user type (active or inactive), to serve a large number of users.


Twitter's service authenticates and authorizes users by using their credentials. The credentials must be encrypted through HTTPS/TLS. Typical users can authenticate themselves using the Authorization: Basic <encodedCreds> header. Moreover, we adopt the OAuth/OIDC code authentication with the PKCE mechanism for third-party logins (such as through Google, Apple, etc.).

Low latency

For systems like Twitter, where the number of users and data increase daily, we maintain as low a latency as possible by pre-generating the timelines and storing them in the feed cache to serve the active users. Since the system is read-heavy and users are likely to read the most recent Tweets, it makes sense to keep them in the cache. Furthermore, we adopt cursor pagination, which helps us to paginate the Tweets efficiently using the next-cursor pointer to avoid unnecessary load on the network, client, or server side. Also, links to popular data (such as images, videos, scripts, and so on) are generated dynamically, which directs the user to the nearest CDN. The client can download popular media content from the CDNs, which allows us to improve the response time. The following illustration shows how users can retrieve trends or Tweets related to them from the nearest CDNs instead of fetching them from the origin server.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.