Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

What is a two phase commit (2-PC)?

Abdul Monum

Two Phase Commit (2-PC) is a consistency protocol that is widely used in many classical distributed systems as well as database management. The main objective of this protocol is to have a distributed agreement among all participants on saving data updates under failures. In many distributed applications, we want all servers and their replicas to be consistent ,i.e., reflect the same data changes at all times. Similarly, in databases where a database might be partitioned into multiple machines for performance gains, a transaction might touch more than one partition. However, all partitions must be consistent. So, how do we guarantee consistency? Here’s where 2-PC comes in.

As the name suggests, 2-PC consists of two distinct phases that allow the system to rollback an update under failures. Let’s dive right into the protocol.

First Phase
1 of 4

First Phase

In the protocol, there is a special node called the Coordinator server which coordinates all operations among other nodes. The coordinator server can be selected through any Leader Election algorithm. First, the coordinator server sends a “prepare” message that asks all servers to prepare for an update. When servers receive this update, they save the update to the disk, but they do not commit the update, meaning that this is a temporary update that can be rolled back. Servers reply with a “Yes” or “No”- if a server sees the update and has saved it in the disk, it replies with a “Yes”. If, for example, the disk has been corrupted or there are other memory issues the server may reply back with a “No”. In the first phase of 2-PC, the update is temporary at each server.

Second Phase

In the second phase, if any of the servers reply with “No”, or if there is a timeout before all the replies are received, then the coordinator server issues an “Abort” message. The message stops the update and, once each server receives this message, they delete the update from the disk. If all of the servers reply back with “Yes” within the timeout value, the coordinator server issues a “Commit” message. Upon receiving this message, each server commits the update by pulling the update from the disk and transferring it into the server’s data store.

Once the update is finalized and usable, each server sends out an “OK” message to the coordinator server. As soon as the coordinator receives all the “OK” messages, it will determines if the update has been committed.

Failures in 2-PC

Lets now analyze how 2-PC deals with failures:

  • Servers only finalize the update after the “Commit” message, which implies that all servers are ready and have the update on the disk.
  • If a server voted “No”, it can abort right away and delete the update from the disk as it knows that the coordinator server would never send a “Commit” message after receiving a “No”.
  • Each server saves the tentative update to the disk before replying “Yes” or “No” in the first phase, so even if a server fails, the update is retrievable after crash recovery.
  • The coordinator server logs every decision, including each sent and received message. If a coordinator server fails, then it can resume after recovery using these logs.
  • If the coordinator server cannot recover, a new coordinator will selected through a Leader Election algorithm. In this case, the new coordinator has two ways to proceed:
    • it can poll other servers to get their responses again
    • it can start the data operation anew

Through the failure analysis, we can observe that 2-PC is a consistency protocol with a stringent correctness condition, meaning that if one server commits the update, no other server rejects it. If one server rejects it, no other server commits it. Therefore it is considered all-or-nothing.

RELATED TAGS

CONTRIBUTOR

Abdul Monum
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring