Ensure Scalability and Replication

Explore methods to ensure key-value store scalability and high availability. Implement consistent hashing to partition data efficiently, using virtual nodes to distribute load uniformly and prevent hotspots. Define peer-to-peer replication strategies to achieve durability across multiple storage nodes.

We'll cover the following...

Add scalability
- Consistent hashing
  - Use virtual nodes
    - Advantages of virtual nodes
Data replication
- Primary-secondary approach
- Peer-to-peer approach

Add scalability

Scalability requires distributing data across multiple storage nodes. As demand changes, we must dynamically add or remove nodes. To achieve this, we partition data to balance the load across the system.

A traditional partitioning method uses the modulus operator. For a system with 4 nodes, we want 25% of requests to go to each node. When a request arrives, we hash its key and compute the remainder modulo m. The result x (calculated as hash % m) determines which node processes the request.

The following slides explain this process:

Next, we will examine how to distribute data efficiently.

Consistent hashing

Consistent hashing manages load effectively by minimizing data movement during scaling. We visualize the hash space as a ring with values from $0$ to $n-1$ , where $n$ is the total number of available hash values.

Each node ID is hashed to assign it a position on the hash ring. Request keys are hashed using the same function to determine their position on the ring. The request is routed to the first node encountered when traversing clockwise from the key’s position on the ring.

When a new node joins the ring, it takes over a portion of the keys from its immediate successor. Other nodes remain unaffected. This ensures that only a small subset of keys moves, making scaling efficient. Since hashes are randomly distributed, the load is generally expected to be even on average.

The primary benefit of consistent hashing is that adding or removing nodes requires moving only a minimal number of keys. However, in practice, random distribution does not guarantee equal load. A server handling a large segment of the ring may receive a disproportionate share of storage and retrieval requests. This creates a hotspot, which can bottleneck the entire system.

As shown in the figure below, if the segment between nodes N4 and N1 is large, N1 handles significantly more requests than other nodes. This non-uniform distribution degrades performance.

Note: It’s a good exercise to think of possible solutions to the non-uniform load distribution before reading on.

Use virtual nodes

To distribute load more evenly, we use virtual nodes. Instead of mapping a physical node to a single point on the ring, we map it to multiple points using different hash functions.

For example, if we use three hash functions, each physical server appears at three distinct positions on the ring. When a request lands on the ring, it is processed by the next virtual node found clockwise, which maps back to a physical server. This interleaving makes the load distribution more uniform. Additionally, if a node has higher hardware capacity, we can assign it more virtual nodes, allowing it to serve a larger portion of requests.

Advantages of virtual nodes

Using virtual nodes offers several benefits:

Fault tolerance: If a node fails or undergoes maintenance, its workload is spread uniformly across multiple other nodes rather than overwhelming a single neighbor.
Capacity management: We can adjust the number of virtual nodes based on the heterogeneityThe system needs to exploit heterogeneity in its infrastructure. e.g., the work distribution must be proportional to the capabilities of the individual servers. It is essential to add new nodes with higher capacity without upgrading all hosts. of the physical infrastructure. A server with double the computational capacity can handle more virtual nodes and take on more load.

We have now made the key-value storage scalable. The next step is ensuring high availability.

Data replication

To ensure durability and availability, we replicate data across multiple nodes. Common strategies include primary-secondary or peer-to-peer replication.

Primary-secondary approach

In a primary-secondary architecture, one node (primary) handles write requests while others (secondaries) replicate data and serve read requests. This introduces replication lag. Furthermore, if the primary fails, the system cannot accept writes until a new primary is elected, creating a single point of failure for write availability.

We’ll use a peer-to-peer relationship for replication. We’ll replicate the data on multiple hosts to achieve durability and high availability. Each data item will be replicated at $n$ hosts, where $n$ is a parameter configured per instance of the key-value store. For example, if we choose $n$ to be $5$ , it means we want our data to be replicated to five nodes.

Each node will replicate its data to the other nodes. We’ll call a node coordinator that handles read or write operations. It’s directly responsible for the keys. A coordinator node is assigned the key “K.” It’s also responsible for replicating the keys to $n-1$ successors on the ring (clockwise). These lists of successor virtual nodes are called preference lists. To avoid putting replicas on the same physical nodes, the preference list can skip those virtual nodes whose physical node is already in the list.

In the illustration below, the replication factor $n$ is 3. For key “K,” data is replicated to nodes B, C, and D. For key “L,” it is replicated to C, D, and E.