...

>

Detailed Design of Sharded Counters

Detailed Design of Sharded Counters

Understand the detailed System Design of sharded counters by analyzing creation, write, and read operations. Discover how to dynamically size shards and implement metrics-based selection to handle massive write bursts efficiently. Understand how sharded counters ensure high scalability and solve the Top K problem.

Detailed design

We’ll now discuss the three primary functionalities of the sharded counter: creation, write, and read, in detail. We’ll answer many important questions by using Twitter as an example. These questions include:

  • How many shards should be created against each new tweet?

  • How will the shard value be incremented for a specific tweet?

  • What will happen in the system when end users issue read requests?

Sharded counter creation

When a user posts a tweet, the createCounter API is called to create multiple counters for that tweet, such as:

  • Like counter

  • Reply counter

  • Retweet counter

  • View counter (for video tweets)

A core design decision is choosing the number of shards per counter. The shard count directly impacts write contention and read amplification. Too few shards increase write contention and throttle throughput. Too many shards increase read amplification because the system must fetch and aggregate values across multiple shards, potentially spanning nodes or regions. As shard count increases, write throughput improves, but read latency and coordination costs rise. This creates a fundamental trade-off between write scalability and read performance.

The number of shards is determined by an estimate of near-term write traffic for a counter. For tweets, this depends on factors like follower count. Tweets from users with millions of followers are assigned more shards because they are likely to receive a high volume of likes or retweets. Similarly, hashtags associated with popular or celebrity tweets may also receive sharded counters, as they can quickly trend.

Tweet activity typically follows a bursty, long-tailed pattern. Engagement spikes shortly after publication and then gradually declines. As traffic patterns evolve, the initial shard allocation can become either overprovisioned or insufficient. Some counters can ...