Detailed design

We will now discuss the three primary functionalities of the sharded counter (creation, write, and read) in detail. We will answer many important questions using Twitter as an example. These questions include: how many shards should be created against each new tweet?; how will the shard value be incremented for a specified tweet?; what will happen in the system when read requests come from the end-users?

Sharded counter creation

As we discussed earlier, when a user posts a tweet on Twitter, the create API is called. The system creates multiple counters for each newly created post by the user. The following is the list of main counters created against each new tweet:

  • Tweet like counter
  • Tweet reply counter
  • Tweet retweet counter
  • Tweet view counter in case tweet contains video

Now the question is, how does the system decide the number of shards in each counter? The decision on the number of shards is very critical for good performance. If the shard count is small for a specific write workload, we will face high write contention resulting in slow writes. On the other hand, if the shard count is too high for a particular write profile, we will encounter higher overhead on the read operation. The reason for slower reads is due to the collection of values from different shards (that might reside on different nodes inside geographically distributed data centers). The reading cost of a counter value will rise linearly with the number of shards because values of all shards of a respective counter will be added. The writes will scale linearly as we add new shards due to increasing requests. Therefore there is a tradeoff between making writes fast versus read performance. We will see later how we can improve read performance.

The decision about the number of shards depends on many factors, that collectively are trying to predict the write load on a specific counter in the short term. For tweets, these factors include followers count. The tweet of a user with millions of followers gets more shards than a user with few followers on Twitter because there is a possibility that their tweets will get many (probably millions) likes. Sometimes, a celebrity tweet has a hashtag(s). The system also creates the sharded counter for this hashtag counter because it has a high chance of getting into trends.

Many human-centered activities often have a long-tailed activity pattern, where many people are concentrated on a relatively small set of activities. Probably ever-shortening attending span might be playing a role here. That means, after some time the flurry of likes will die down and we might not need as many shards now for a counter as were once needed. Similarly, our initial prediction for future writes might turn out to be wrong, and we might need more shards to handle write requests. We need that our system could dynamically expand or shrink the number of shards based on the current need.

We need to monitor the write load for all the shards so that we can appropriately route requests to specific shards (possibly using load balancers). Such a feedback mechanism can also help us decide when to close down some of the shards for a counter and when to add additional shards. Doing so will not only provide good performance for the end-user but also utilize our resources near optimal levels.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy