Introduction to the cluster level

At the single server level, we didn't have to worry about routing or replication. Once we start to deal with thousands of servers, we need to understand the problems that arise with them. The Memcache server clusters' key load is managed by consistent hashing, but there are still challenges to tackle. Network congestion, too many repeating cache misses, dynamic workloads, and cluster failures are all problems that we face at the cluster level and not at the single server level.

Clusters are manageable units of a data center. The number of nodes inside a cluster is configurable. Nodes inside a cluster can communicate with each other with low latency and high throughput (because they are often near each other). After scaling our key-value store on single nodes, the next level is to utilize multiple key-value stores in a cluster.

Overview of design problems at the cluster level

At the cluster level, we attempt to solve a read-heavy workload and a wide fan-outNew content needs to be notified to more number of users, whereas the creation of new content is more rare. . This wide fan-out occurs because a single web request will get routed to a few clusters and then will further trigger multiple Memcached requests.

  • Network congestion: Why do Memcache clusters face network congestion in the first place? We can explain this by giving the example of loading a user feed of posts. One web request can trigger tens, if not hundreds, of Memcached requests that are used to construct the feed.

  • Too many repeating cache misses: If we can't respond to a request quickly enough, the front-end servers consider it as a cache miss and the data will be fetched from a more costly path. So, we need mechanisms that reduce the rate of cache misses.

  • Diverse application needs: Different applications have different requirements for their caches, so there might be some set of key-value items which are very expensive to compute again, like all the birthdays of a user's friends. On the other hand, another set of key-value items, such as viral images that are recommended at random, needs to be replicated quickly, and it's okay if some of the servers miss it as users aren't following the page/person that shared it.

  • Cluster failures: Machine failure is inevitable in any distributed system. So what do we do when a request comes in and is routed to a cluster that has failed? There is a slight chance that the load might cause other clusters to fail, resulting in "cascading failures."

The overall design of Memcache

The individual layers of our cluster-level design are as follows:

Stateless client layer: A stateless client to the Memcached is responsible for optimal retrieval of items from Memcached servers.

Mcrouter layer: Mcrouters are used to route requests to multiple Memcached servers using consistent hashing. To the client layer, the Mcrouter layer is the same as the Memcached layer.

Memcached server layer: Stores and serves the actual key-value items.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.