Version Vectors
Explore how version vectors work in distributed datastores to track updates across nodes. Understand their role in identifying concurrent data changes, managing synchronization between servers and clients, and handling conflicts. This lesson explains version vectors' update rules, limitations, and their differences from vector clocks, setting the stage for advanced techniques in distributed system consistency.
Comparing version vectors with vector clocks
As explained previously, vector clocks are used to maintain a logical form of time, which can then be used to identify when events are happening, especially in comparison to other events.
On the other hand, version vectors are better suited for applications that store data, where every data item is tagged with a version vector. In this way, data can potentially be updated in multiple parts of the system concurrently (e.g., when there is a network partition). So, the version vectors from the resulting data items can help us identify those items that can be reconciled automatically and those that require conflict resolution.
Version vectors maintain a state identical to that in a vector clock, and contain one integer entry per node in the system.
Rules for the protocol
The update rules for version vectors are slightly different: nodes can experience both local updates (e.g., a write applied at a server) or can synchronize with another node (e.g., when recovering from a network partition).
- Initially, all vectors have all their elements set to zero.
- Each time a node experiences a local update event, it increments its own counter in the vector by one.
- Each time two nodes and synchronize, they both set the elements in their vector to the maximum of the elements across both vectors