Cluster Level of Memcache
Let's identify and solve the cluster-level inefficiencies on a large key-value store implementation.
We'll cover the following
- Introduction to the cluster level
- Overview of design problems at the cluster level
- The overall design of Memcache
- Network efficiency problem 1: Data dependencies
- Network efficiency problem 2: Comparing UDP and TCP
- Network efficiency problem 3: Incast congestion
- The repeating cache miss problem
- Diverse application needs problem
- Handling failures
- Summary
Introduction to the cluster level
At the single server level, we didn't have to worry about routing or replication. Once we start to deal with thousands of servers, we need to understand the problems that arise with them. The Memcache server clusters' key load is managed by consistent hashing, but there are still challenges to tackle. Network congestion, too many repeating cache misses, dynamic workloads, and cluster failures are all problems that we face at the cluster level and not at the single server level.
Clusters are manageable units of a data center. The number of nodes inside a cluster is configurable. Nodes inside a cluster can communicate with each other with low latency and high throughput (because they are often near each other). After scaling our key-value store on single nodes, the next level is to utilize multiple key-value stores in a cluster.
Overview of design problems at the cluster level
At the cluster level, we attempt to solve a read-heavy workload and a
Network congestion: Why do Memcache clusters face network congestion in the first place? We can explain this by giving the example of loading a user feed of posts. One web request can trigger tens, if not hundreds, of Memcached requests that are used to construct the feed.
Too many repeating cache misses: If we can't respond to a request quickly enough, the front-end servers consider it as a cache miss and the data will be fetched from a more costly path. So, we need mechanisms that reduce the rate of cache misses.
Diverse application needs: Different applications have different requirements for their caches, so there might be some set of key-value items which are very expensive to compute again, like all the birthdays of a user's friends. On the other hand, another set of key-value items, such as viral images that are recommended at random, needs to be replicated quickly, and it's okay if some of the servers miss it as users aren't following the page/person that shared it.
Cluster failures: Machine failure is inevitable in any distributed system. So what do we do when a request comes in and is routed to a cluster that has failed? There is a slight chance that the load might cause other clusters to fail, resulting in "cascading failures."
The overall design of Memcache
The individual layers of our cluster-level design are as follows:
Stateless client layer: A stateless client to the Memcached is responsible for optimal retrieval of items from Memcached servers.
Mcrouter layer: Mcrouters are used to route requests to multiple Memcached servers using consistent hashing. To the client layer, the Mcrouter layer is the same as the Memcached layer.
Memcached server layer: Stores and serves the actual key-value items.
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.