Add scalability
Scalability requires distributing data across multiple storage nodes. As demand changes, we must dynamically add or remove nodes. To achieve this, we partition data to balance the load across the system.
A traditional partitioning method uses the modulus operator. For a system with 4 nodes, we want 25% of requests to go to each node. When a request arrives, we hash its key and compute the remainder modulo m. The result x (calculated as hash % m) determines which node processes the request.
The following slides explain this process:
We want to scale infrastructure with minimal disruption. However, modular hashing is inefficient for dynamic scaling. Adding or removing a node changes the divisor m, which alters the mapping for nearly all keys.
For example, if node 2 is removed, a key previously mapped to it might shift to node 1 because
Next, we will examine how to distribute data efficiently.
Consistent hashing
Consistent hashing manages load effectively by minimizing data movement during scaling. We visualize the hash space as a ring with values from