Solving the Distributed Snapshot Problem

Let's explore a seminal algorithm used for capturing distributed snapshots.

We'll cover the following

Chandy-Lamport algorithm

The Chandy Lamport algorithm solves the consistent snapshot problem in a distributed system.

Idea

The algorithm is based on the following main idea: a marker message is sent between nodes using the available communication channels that represent an instruction to a node to record a snapshot of the current state.

Working

The algorithm works as follows:

  • The node that initiates the protocol records its state and then sends a marker message to all the outbound channels.

Importantly, the marker is sent after the node records its state and before any further messages are sent to the channels.

  • When a node receives a marker message, its behaviour depends on whether the node has already recorded its state (while emitting the mark previously) or not.
    • If the node has not recorded its state, it records its state, and then it records the state of the channel cc the marker was received from as an empty sequence. It then sends the marker to all the outbound channels.
    • If the node has recorded its state, it records the state of the channel the marker was received from as the sequence of messages received from cc after the node’s state was recorded and before the node received the marker from cc.

Get hands-on with 1200+ tech skills courses.