Detailed Design of Twitter

Learn the complex storage strategy behind Twitter’s real-time platform. Analyze how specialized systems like Manhattan, Blobstore, and FlockDB are deployed alongside advanced caching and sharded counters to solve challenges like heavy hitters and ensure high availability in a web-scale System Design.

We'll cover the following...

Storage system
Cache
Observability
Real-world complex problems
The complete design overview

Storage system

Storage is a core component of any real-time system. Twitter employs a polyglot persistence architecture, selecting specific storage modelsA storage model represents a data store’s essential physical aspects. to optimize performance for different services. This section explores how Twitter evolved its storage strategy, moving between various databases and platforms to meet scaling demands.

Note: This lesson draws on insights from Twitter’s technical blogs.

Google Cloud: Twitter uses HDFS (Hadoop Distributed File System) across tens of thousands of servers to host over $300\ PB$ of data. This includes logs (client, Tweet, and timeline events), database backups, and analytics data. Data in HDFS is compressed using LZOLempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed. for efficiency. In 2018, Twitter adopted a partly cloudy strategy, migrating data from on-premise Hadoop clusters to Google Cloud. Initially, they moved Ad-hoc clusters and cold storage, while keeping real-time production clusters on-premise. Big data is stored in BigQueryA fully managed and highly scalable serverless data warehouse and accessed via Presto, a distributed SQL query engine.
Manhattan: To handle rapid user growth, Twitter initially attempted to replace MySQL with Cassandra. However, due to specific limitations, they deprecated Cassandra in 2014 and launched Manhattan, a proprietary, ...