Reaching Consensus in Distributed Systems

Learn the fundamental approaches to how processes and nodes reach a consensus on shared values in this lesson.


A fundamental aspect of distributed computing is consensus problems because the systems require an agreement among distributed processes. This section introduces the fundamental approaches related to the subject of how processes and nodes reach a consensus on shared values in order to achieve a generally accepted final state of data in distributed systems, despite failures in single processes. We’ll show historical approaches for solving these problems and their development of practical solutions, which are implemented nowadays. In the end, we’ll see that the blockchain is a reasonable approach to solve this problem in a fully trustless environment (Leslie Lamport et al. (1982)Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine general’s problem. ACM Trans. Program. Lang. Syst., 4(3):382-401, July 1982. and Marshall Pease et al. (1980)Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228-34, April 1980.)

In this section, we obtain different forms of consensus problems and we study how consensus problems are solvable with the use of different failure and timing models. We first introduce the problem of consensus and the related problem of the Byzantine agreement. Roughly speaking, both problems include the procedure for processes or nodes to agree on a single value or state after some of the processes or nodes might have failed. This means it should be possible for processes and nodes to reach an agreement even in presence of faults. We’ll introduce techniques that serve as solutions for the agreement problem in distributed systems by analyzing the conditions that must hold in order to apply specific protocols.

An important distinction will be the kind of failures that can occur in a single process of a distributed system. We need to examine their influence on how the system works and how to deal with them. As we’ll see, there are mainly two kinds of faults, namely crash failures, and arbitrary or so-called Byzantine failures. As a consequence, we’ll introduce two failure models, studying under which considerations they are able to achieve agreement. We’ll specify this problem, distinguishing between consensus in case of crash failures and Byzantine agreements in case of Byzantine faults.

A second important aim of the chapter is the distinction of whether the considered distributed system is synchronous or asynchronous. We’ll see that a synchronous system comes with strong timing assumptions, i.e., one can assume that there are bounds on the time of process execution and on message delays. On the contrary, we cannot make any timing assumptions in asynchronous systems.

This will lead us to the fundamental so-called FLP impossibility result, which states that consensus cannot be achieved deterministically in an asynchronous system that’s subject to even one single crash failure. Since only one crash failure makes deterministic agreement impossible, there’s consequently a need for techniques that are able to circumvent this impossibility. We outline the main approaches which guarantee the achievement of consensus even under asynchronous conditions.

As the last step, we introduce practical protocols that solve the consensus problem in real distributed systems. As we’ll see, the blockchain algorithm is such an algorithm that makes consensus possible.

Get hands-on with 1200+ tech skills courses.