Detailed Design of Twitter
Learn the complex storage strategy behind Twitter’s real-time platform. Analyze how specialized systems like Manhattan, Blobstore, and FlockDB are deployed alongside advanced caching and sharded counters to solve challenges like heavy hitters and ensure high availability in a web-scale System Design.
Storage system
Storage is a core component of any real-time system. Twitter employs a polyglot persistence architecture, selecting specific
Note: This lesson draws on insights from Twitter’s technical blogs.
Google Cloud: Twitter uses HDFS (Hadoop Distributed File System) across tens of thousands of servers to host over
of data. This includes logs (client, Tweet, and timeline events), database backups, and analytics data. Data in HDFS is compressed using for efficiency. In 2018, Twitter adopted a partly cloudy strategy, migrating data from on-premise Hadoop clusters to Google Cloud. Initially, they moved Ad-hoc clusters and cold storage, while keeping real-time production clusters on-premise. Big data is stored inLZO Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed. and accessed via Presto, a distributed SQL query engine.BigQuery A fully managed and highly scalable serverless data warehouse Manhattan: To handle rapid user growth, Twitter initially attempted to replace MySQL with Cassandra. However, due to specific limitations, they deprecated Cassandra in 2014 and launched Manhattan, a proprietary, ...