What is consistent hashing?
Problem
Consider a problem, where a set of records needs to be assigned to number of servers. An easy way of doing that is to evenly distribute records in each server. For that, we can use mod operation. For example, for a record we can get the server by r mod n formula.
Now the issue is if we want to add a new server or if we want to delete a server then nearly all the record keys to be remapped. This brings a great deal of overhead.
Consistent Hashing
In order to solve the above problem, consistent hashing is used because in this technique on average needs to be remapped, where is the number of records and is the number of servers slots. It should be done when:
- There are a number of servers that need to be scaled up or down depending upon the load.
- There are cache servers that need to be scaled.
There are many benefits to using consistent hashing:
- Scalability
- Load distribution
- Quick replication and partitioning of data
- Faster retrieval of keys as each server holds a limited number of keys
The key idea behind consistent hashing is that every record and server is mapped on the unit circle. Each record is then assigned to the first server that appears on the circle in a clockwise direction. This brings even distribution of records.
Here, if a new server is added to the unit circle then the records next to the server need to be updated whereas all the other records maintain the previous assignments. Similarly, when a server is removed from the unit circle, only records that are associated with it needs to be updated.
Free Resources