...

>

Ensure Scalability and Replication

Ensure Scalability and Replication

Add scalability

Scalability requires distributing data across multiple storage nodes. As demand changes, we must dynamically add or remove nodes. To achieve this, we partition data to balance the load across the system.

A traditional partitioning method uses the modulus operator. For a system with 4 nodes, we want 25% of requests to go to each node. When a request arrives, we hash its key and compute the remainder modulo m. The result x (calculated as hash % m) determines which node processes the request.

The following slides explain this process:

We get the hash of the key and take modulus with the number of nodes to find the node that should process the request
1 / 3
We get the hash of the key and take modulus with the number of nodes to find the node that should process the request

We want to scale infrastructure with minimal disruption. However, modular hashing is inefficient for dynamic scaling. Adding or removing a node changes the divisor m, which alters the mapping for nearly all keys.

For example, if node 2 is removed, a key previously mapped to it might shift to node 1 because 10%3=110 \% 3 = 1. Since nodes cache data locally, this shift forces massive data migration (reshuffling) to the new target nodes, causing high latency and network congestion.

Next, we will examine how to distribute data efficiently.

Consistent hashing

Consistent hashing manages load effectively by minimizing data movement during scaling. We visualize the hash space as a ring with values from 00 ...