...

>

Versioning Data and Achieving Configurability

Versioning Data and Achieving Configurability

Define how vector clocks manage data versioning and resolve conflicts caused by network partitions in a key-value store. Learn to implement configurable consistency using the quorum system. Understand how r and w parameters control read/write trade-offs for performance and availability.

Data versioning

Network partitions and node failures can fragment an object’s version historyThe version history contains the details of objects that have been modified. during updates. This results in multiple, potentially divergent copies of the same data. To prevent data loss, the system must accept these concurrent versions and reconcile them to maintain consistency.

Two nodes replicating their data while handling requests
1 / 4
Two nodes replicating their data while handling requests

To resolve the inconsistency, the system needs to track causal relationships between events, for example by using logical clocks or version vectors. Physical timestamps are unreliable in distributed systems because clocks can drift or become unsynchronized, so they cannot safely determine which request happened last.

Instead, we use vector clocks to maintain causality. A vector clock is a list of (node, counter) pairs associated with every version of an object. By comparing vector clocks, we can determine if two versions are causally related or if a conflict exists that requires reconciliation.

AI Powered
Saved
1 Attempts Remaining
Reset
How do we ensure data integrity in a key-value store?

Explain how metadata like versioning and checksums, which detect data corruption, help maintain data integrity and consistency in a key-value store.

Modify the API design

To enforce causality with vector clocks, each request must include the vector clock from the previous operation along with the originating node ID. The API must be updated so clients send the prior vector clock and node ID with each write request.

The get API call is updated as follows:

get(key)

Parameter

Description

key

This is the key against which we want to get value.

This returns an object (or a collection of conflicting objects) along with a context. The context contains encoded metadata, such as the object’s version.

The put API call is updated as follows:

put(key, context, value)

Parameter

Description

key

This is the key against which we have to store value.

context

This holds the metadata for each object.

value

This is the object that needs to be stored against the key.

This function locates the correct node based on the key and stores the value. The client must provide the context received from a previous get operation to update an object. This context allows the system to determine version history via vector clocks. If a read request reveals divergent branches (conflicts), the system returns all objects at the leaf nodes with their version information. The clientHere the client is any frontend server in our trusted data center. It does not mean the end user. must then reconcile these versions into a single new version.

Note: This is similar to how Git handles merge conflicts between branches. If the system cannot automatically merge the versions, the client must resolve the conflict at the application level and submit the resolved value.

Vector clock usage example

Let’s consider an example. Say we have a write operation request. Node A\text{A} handles the first version of the write request, E1\text{E1}; where E\text{E} means event. The corresponding vector clock has node information and its counter that is, [A,1]\text{[A,1]}. Node A ...