System Design Deep Dive: Real-World Distributed Systems/

...

Protocols for Maintaining Fault Tolerance: Part II

Learn to add new elements to a running state machine system.

We'll cover the following...

Integrating repaired elements
What’s next?

So far we’ve just discussed removing faulty elements and haven’t yet explored adding repaired or new elements to the system. Let's see how we can successfully integrate a new or repaired component into a system of state machine replicas.

Integrating repaired elements

It is not enough for the element being added to be non-faulty. It must also be in the right state to behave consistently with other components. Let's start by introducing some notation:

We define $e[r_i]$ as the state of a non-faulty element e after processing request $r_0$ through $r_i$ . An element $e$ that joins a configuration after request $r_{join}$ must be in the state $e[r_{join}]$ for it to behave consistently after joining so it may successfully become part of the system.

An element is self-stabilizing if its current state is completely defined by a fixed number of previously processed inputs, say $k$ inputs. For such elements, all we need to do is ensure that the element runs long enough to process $k$ inputs and will be in state $e[r_{join}]$ . For non-self-stabilizing elements, we need to do things differently. In the following discussion, we will discuss two such cases:

Logical clocks and fail-stop failures

When using logical clocks and assuming only fail-stop failures, we only require the state of a state machine replica $sm_i$ . The state of $sm_i$ will be correct since we know that $sm_i$ is non-faulty. Let's consider the following three cases in which the integrated element is an output device, a client, or a state machine replica:

For an output device $e$ , we require little information to integrate it. This information may include setup and startup information and other trivial information that changes infrequently and can be stored in state machine replicas.
For a client $e$ , we can obtain the required information from other clients.
For a state machine replica $e$ , we can use information from any of the non-faulty state machine replicas $sm_i$ .

For an output or client $e$ , we can communicate state $e[r_{join}]$ to $e$ ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Protocols for Maintaining Fault Tolerance: Part II

Integrating repaired elements

Logical clocks and fail-stop failures