Introduction

Some distributed systems may rely on node clocks to be synchronized to function. It is easy for unsynchronized clocks or misconfigured NTP clients to go undetected for long periods of time, since a drifting clock is unlikely to cause a system crash but a subtle and gradual data loss. For systems that must work with synchronized clocks, it is important that a check is kept on the offset of clocks for all nodes in a cluster. If any one of the clocks drifts too far away from the rest of clocks in the system, then the node with the offending clock should be considered dead and removed from the cluster.

To see how clocks in a distributed system can cause data loss, we’ll work through an example of a database that is written to by multiple clients and the strategy used to handle conflicts is last write wins. The node in the database that receives the write request assigns the timestamp to the write request and not the client which sends the write request. The “last” write is determined based on the timestamps of the write requests. The database node uses the time of day clock to assign the timestamp when recording the write. The writes to the system are ordered using the timestamps, even though the timestamps may have been from different nodes of the distributed database. Let’s see how such an attempt to order writes by timestamp is prone to data loss.

Say, the database comprises of three nodes and the clocks for each of the nodes are slightly ahead or behind of each other. In the following sequence of events we’ll denote the time of each node’s clock as nodex(hour:milliseconds_past_the_hourt) e.g. node1(5:61000), would mean that node1’s time of day clock has a reading of 5:01:01 AM. Given this context, consider two clients that attempt to write the same value to the database for a key K.

At the start the three nodes have their clocks read as:
node1(3:5510)
node2(3:6000)
node3(3:5002)
Client1 sends a write request K=5 to node1, which is recorded at node1 with the timestamp 3:5510. We’ll represent the write as a tuple (K=5, 3:5510). The timestamp in the tuple is that of the node recording
Node1 replicates client1’s write to other nodes. It sends the tuple (K=5, 3:5510) to nodes 2 and 3.
Node3 receives the tuple from node1 and records it but node2 doesn’t receive the tuple due to a network delay just yet. Furthermore, say network delay takes 3 milliseconds so that node3 receives the tuple from node1 at 3:5005 according to ...

Basics

Kafka Producer

Kafka Consumer

Kafka Internals

Conclusion

Appendix

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Working with Time Issues

Introduction