System Design Deep Dive: Real-World Distributed Systems/

...

Introduction to Consensus in Distributed Systems

Explore what we’ll study in the upcoming chapters regarding consensus in distributed systems.

We'll cover the following...

Motivation
Consensus algorithms

Motivation

Consensus is one of the oldest problems in designing large-scale systems, and we have many algorithms to achieve consensus under different circumstances. We need consensus in our systems for many tasks, such as electing a leader in primary-secondary configurations, consistent data replication across many replicas of a database or filesystem, distributed locking, distributed transactions, atomic broadcast, and more. It will not be an exaggeration to say that every large-scale distributed system uses consensus frequently in daily operations. In this section, we’ll study the theoretical foundations and practical algorithms to achieve consensus when nodes or networks can fail in different ways.

Consensus algorithms

We will discuss three practical consensus algorithms—Two-phase commit, Paxos, and Raft. These algorithms are commonly used in many systems, such as databases and replicated data stores. These consensus algorithms borrow heavily from decades of research to achieve consensus in a fault-tolerant manner. To better understand the practical consensus algorithms, we have included two chapters on consensus fundamentals and how to achieve safe replication using state machines. Let's discuss high-level details of our content.

Consensus fundamentals (Two Generals' Problem, the FLP impossibility, Byzantine Generals Problem)

Our first chapter establishes the baseline for distributed consensus on what is possible and what's not under different computational models. This chapter also formally defines the necessary terminology and provides a lens through which we can understand the chapters on two-phase commit, Paxos, and Raft.

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Introduction to Consensus in Distributed Systems

Motivation

Consensus algorithms

Consensus fundamentals (Two Generals' Problem, the FLP impossibility, Byzantine Generals Problem)