...

/

Detailed Design of Sharded Counters

Detailed Design of Sharded Counters

Learn about the design of sharded counters in detail.

We'll cover the following...

Detailed design

We’ll now discuss the three primary functionalities of the sharded counter–creation, write, and read–in detail. We’ll answer many important questions by using Twitter as an example. These questions include:

  • How many shards should be created against each new tweet?

  • How will the shard value be incremented for a specific tweet?

  • What will happen in the system when read requests come from the end users?

Sharded counter creation

As we discussed earlier, when a user posts a tweet on Twitter, the \createCounter API is called. The system creates multiple counters for each newly created post by the user. The following is the list of main counters created against each new tweet:

  • Tweet like counter

  • Tweet reply counter

  • Tweet retweet counter

  • Tweet view counter in case a tweet contains video

Now, the question is how does the system decide the number of shards in each counter? The number of shards is critical for good performance. If the shard count is small for a specific write workload, we face high write contention, which results in slow writes. On the other hand, if the shard count is too high for a particular write profile, we encounter a higher overhead on the read operation. The reason for slower reads is because of the collection of values from different shards that might reside on different nodes inside geographically distributed data centers. The reading cost of a counter value rises linearly with the number of shards because the values of all shards of a respective counter are added. The writes scale linearly as we add new shards due to increasing requests. Therefore, there is a trade-off between making writes quick versus read performance. We’ll discuss how we can improve read performance later.

The decision about the number of shards depends on many factors that collectively try to predict the write load on a specific counter in the short term. For tweets, these factors include follower count. The tweet of a user with millions of followers gets more shards than a user with few followers on Twitter because there is a possibility that their tweets will get many, often millions, of likes. Sometimes, a celebrity tweet includes one or more hashtags. The system also creates the sharded counter for this hashtag because it has a high chance of being marked as a trend.

Many human-centered activities often have a long-tailed activity pattern, where many ...