System Design Deep Dive: Real-World Distributed Systems/

...

Cluster Level of Memcache

Let's identify and solve the cluster-level inefficiencies on a large key-value store implementation.

We'll cover the following...

Introduction to the cluster level
Overview of design problems at the cluster level
The overall design of Memcache
Network efficiency problem 1: Data dependencies
Network efficiency problem 2: Comparing UDP and TCP
Network efficiency problem 3: Incast congestion
The repeating cache miss problem
Diverse application needs problem
- Replication within pools
Handling failures
- Smaller outages
Summary

Introduction to the cluster level

At the single server level, we didn't have to worry about routing or replication. Once we start to deal with thousands of servers, we need to understand the problems that arise with them. The Memcache server clusters' key load is managed by consistent hashing, but there are still challenges to tackle. Network congestion, too many repeating cache misses, dynamic workloads, and cluster failures are all problems that we face at the cluster level and not at the single server level.

Clusters are manageable units of a data center. The number of nodes inside a cluster is configurable. Nodes inside a cluster can communicate with each other with low latency and high throughput (because they are often near each other). After scaling our key-value store on single nodes, the next level is to utilize multiple key-value stores in a cluster.

Overview of design problems at the cluster level

At the cluster level, we attempt to solve a read-heavy workload and a wide fan-outNew content needs to be notified to more number of users, whereas the creation of new content is more rare. . This wide fan-out occurs because a single web request will get routed to a few clusters and then will further trigger multiple Memcached requests.

Network congestion: Why do Memcache clusters face network congestion in the first place? We can explain this by giving the example of loading a user feed of posts. One web request can trigger tens, if not hundreds, of Memcached requests that are used to construct the feed.
Too many repeating cache misses: If we can't respond to a request quickly enough, the front-end servers consider it as a cache miss and the data will be fetched from a more costly path. So, we need mechanisms that reduce the rate of cache misses.
Diverse application needs: Different applications have different requirements for their caches, so there might be some set of key-value items which are very expensive to compute again, like all the birthdays of a user's friends. On the other hand, another set of key-value items, such as viral images that are recommended at random, needs to be replicated quickly, and it's okay if some of the servers miss it as users aren't following the page/person that shared it.
Cluster failures: Machine failure is inevitable in any distributed system. So what do we do when a request comes in and is routed to a cluster that has failed? There is a slight chance that the load might cause other clusters to fail, resulting in "cascading failures."

The overall design of Memcache

The individual ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Cluster Level of Memcache

Introduction to the cluster level

Overview of design problems at the cluster level

The overall design of Memcache