Synchronous vs. Asynchronous Replication

Explore the fundamental concepts of synchronous and asynchronous data replication in distributed systems. Understand how each method affects write latency, data durability, and system reliability. Learn about primary-secondary, primary-primary, and multi-primary replication strategies, and evaluate trade-offs to design resilient and scalable storage solutions.

We'll cover the following...

Database replication
Replication in real-world systems
Trade-offs in durability, latency, and failure recovery
Conclusion

Relying on a single server creates a single point of failure.

When that server fails, its data becomes temporarily inaccessible, resulting in downtime that can frustrate users and compromise business reliability. To build resilient and scalable distributed systems, we must eliminate these single points of failure by creating copies of our data across multiple machines.

This process, known as data replication, is a foundational concept in System Design.

The core challenge moves beyond whether to replicate data and focuses on how to do it effectively. The strategy we choose directly impacts our system’s performance, consistency, and durability guarantees. Let’s explore the two primary approaches to replication: synchronous and asynchronous.

Database replication

Database replication is the process of maintaining identical copies of data on multiple nodes, allowing reads to scale and enabling the system to survive failures.

A primary (leader) accepts writes; one or more replicas (followers) receive those changes to maintain up-to-date copies. The fundamental difference between synchronous and asynchronous replication lies in when the primary node confirms a write operation back to the client.

Let’s break down each approach step by step.

Synchronous replication

In synchronous replication, the primary node waits to acknowledge a client’s write request until at least one replica has confirmed that it has received and saved the data. The process typically follows these steps:

The client sends a write request to the primary node, which writes the data to its local storage.
The primary forwards the data to its replicas.
The replicas write the data to their local storage and send an acknowledgment back to the primary.
Once the primary receives confirmation from its synchronous replicas, it sends a success acknowledgment back to the client.

To illustrate this, here is the write path when the primary waits for replicas before acknowledging the client.

1.Introduction to System Design

2.Distributed System Fundamentals

3.Communication in Distributed Systems

4.Storage and Data Management

5.Security in System Design

6.Trade-Offs and Real-World Design Principles

7.Wrapping Up Fundamentals of System Design

Synchronous vs. Asynchronous Replication

Database replication

Synchronous replication