Different Partitioning Strategies in Databases
Understand how databases partition data across distributed systems using key range and key hash methods. Learn the advantages and disadvantages of each, and explore consistent hashing for balanced load distribution and smooth scaling. This lesson helps you grasp different partitioning strategies to optimize query performance and avoid hotspots.
This section will discuss two strategies to partition the dataset in a distributed environment:
Partition by a key range
Partition by key hash
Partition by a key range
In this strategy, we divide a continuous range of keys into buckets. Then, we assign each bucket to a partition. A single host instance can hold multiple partitions. The range of keys assigned to a bucket may or may not be continuous. Within each partition, they store keys in sorted order, thus facilitating range scan queries.
In the example above:
We have
3host instances namelyNode 1,Node 2, andNode 3.Node 1has2partitions.Partition 1includes keys starting fromA,B,C,D,E.Partition 2includes keys starting fromF,G,H,I,J.
Node 2has2partitions.Partition 3includes keys starting fromK,L,M,N,O.Partition 4includes keys starting fromP,Q, ...