Partitioning Techniques

Learn different partition techniques for a distributed system.

How we partition our data in a distributed system matters significantly, so based on the data we have, we must choose the correct partitioning strategy. Otherwise, things get complicated in the long run.

Good partitioning technique results in even distribution of data among the nodes. If we had nn nodes and we scale out to 2n2n nodes, a good partitioning strategy will help us to gain two times the performance from our system.

Before discussing partitioning techniques, let’s first assume that the data we have has some keys for each row. Each key identifies a row uniquely. This is more or less a common expectation in all different databases.

Let’s discuss a few techniques.

Range-based partitioning

In range-based partitioning, data is partitioned based on the ranges of the key. For example, say we have keys of type strings. Now the data with keys starting with ‘a’ to ‘j’ is stored in node 1, ‘k’ to ‘p’ in node 2, and ‘q’ to ‘z’ in node 3.

Get hands-on with 1200+ tech skills courses.