Introduction to Consensus in Distributed Systems

Explore what we’ll study in the upcoming chapters regarding consensus in distributed systems.

Motivation

Consensus is one of the oldest problems in designing large-scale systems, and we have many algorithms to achieve consensus under different circumstances. We need consensus in our systems for many tasks, such as electing a leader in primary-secondary configurations, consistent data replication across many replicas of a database or filesystem, distributed locking, distributed transactions, atomic broadcast, and more. It will not be an exaggeration to say that every large-scale distributed system uses consensus frequently in daily operations. In this section, we’ll study the theoretical foundations and practical algorithms to achieve consensus when nodes or networks can fail in different ways.

Consensus algorithms

We will discuss three practical consensus algorithms—Two-phase commit, Paxos, and Raft. These algorithms are commonly used in many systems, such as databases and replicated data stores. These consensus algorithms borrow heavily from decades of research to achieve consensus in a fault-tolerant manner. To better understand the practical consensus algorithms, we have included two chapters on consensus fundamentals and how to achieve safe replication using state machines. Let's discuss high-level details of our content.

Consensus fundamentals (Two Generals' Problem, the FLP impossibility, Byzantine Generals Problem)

Our first chapter establishes the baseline for distributed consensus on what is possible and what's not under different computational models. This chapter also formally defines the necessary terminology and provides a lens through which we can understand the chapters on two-phase commit, Paxos, and Raft.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.