This section will discuss two strategies to partition the dataset in a distributed environment:

  • Partition by a key range

  • Partition by key hash

Partition by a key range

In this strategy, we divide a continuous range of keys into buckets. Then, we assign each bucket to a partition. A single host instance can hold multiple partitions. The range of keys assigned to a bucket may or may not be continuous. Within each partition, they store keys in sorted order, thus facilitating range scan queries.

Get hands-on with 1200+ tech skills courses.