Understanding the Causal Consistency Model in 2026
Learn how causal consistency preserves cause-and-effect relationships in distributed systems, explore Lamport and vector clocks, understand consistency trade-offs, and discover why many modern applications rely on causal ordering.
Most of our daily life events are based on cause and effect—when we order a meal on our app, it gets delivered after some time. It can confuse people if such causality is not maintained in the digital world, such as a group chat application (WhatsApp or iMessage). More serious repercussions can happen, such as data loss, if, for example, a data store cannot find cause-and-effect relations in the read/write messages coming to it. However, as we will see later in this blog while cause-and-effect comes naturally to humans, enforcing causality in digital systems is not trivial.
is a distributed system model that ensures that causally related events appear in a consistent order across distributed nodes while permitting flexibility in the ordering of unrelated events. It balances data consistency and system availability in distributed computing. Causal consistency Lloyd, Wyatt, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. “A Short Primer on Causal Consistency.” ;login: 38, no. 4, August 2013.
Let's consider an example:
Three friends—Alice, Sally, and Bob—are chatting with each other using an app similar to WhatsApp. Alice asks the other two about a recipe (messages tagged as
in the following illustration). Sally receives this message and sends a subsequent message to Bob (message
). Because Bob hasn’t yet received message
(that caused message ), this message from Sally caused confusion for Bob.
Later on, message
caused message , but message reached earlier than to Alice (point marked as ), potentially ruining the recipe order.
It is clear from this simple example how a non-causal system can be a problem for its users.
Note: Chat apps like WhatsApp and iMessage work in a client-server model where chat participants’ messages are sent to a common server, and then that server disseminates the messages to everyone. In the following illustration, we have omitted such a server for simplicity. We can assume, for example, that Alice sent message
to a server, and that server then sent two separate messages to Sally and Bob. Since it is a group chat, each message needs to go to everyone. However, to reduce clutter, we don’t show some such message. For example, Sally’s first message also goes to Alice but we didn’t show it.
Data consistency models are like a contract between the end programmers and the system. Such a contract sets the expectation that how data writes and reads by multiple entities can interleave to provide a legal ordering of events. Many real-world applications rely on the guarantees provided by a consistency model for their correctness. For example, a bank’s storage system that provides strict serializability makes it possible for multiple (possibly concurrent) transactions to leave the account ledger in a consistent state.
System Design Deep Dive: Real-World Distributed Systems
Modern software systems are expected to operate at a massive scale while meeting strict reliability and latency requirements. Whether it’s a feed refresh, a payment request, or a real-time analytics query, users expect systems to respond instantly and consistently. That expectation has raised the bar for engineers today, understanding that System Design isn’t optional. It’s a core skill for building and evaluating production-grade systems. I built this course from my experience working on large-scale distributed systems at Microsoft (Azure) and Meta (Scuba), and from interviewing hundreds of candidates across both companies. The pattern I kept seeing was this: candidates understood individual components, but struggled to combine them into a coherent system. They knew what a cache or load balancer was, but not when or why to use it. This course is designed to bridge that gap. We start with the foundational building blocks of System Design, including databases, caching layers, load balancing, and messaging systems, and focus on how they interact under real-world constraints. From there, we analyze systems built by companies like Google, Facebook, and Amazon, breaking them down to understand the trade-offs behind each design decision. The goal is not just to learn concepts, but to develop the ability to reason through them in practice. This approach has helped a large number of engineers build stronger intuition for System Design and perform better in interviews. If you want to understand how real systems are designed and be able to design them yourself, this course gives you a clear, practical path forward.
We are surveying commonly used consistency models in databases and distributed systems. In our earlier blogs, we discussed the following hierarchy of consistency models. Linearizability and sequential consistency. This blog is about causal consistency. Let’s remind ourselves of some important aspects of this hierarchy.
The causal consistency is a weaker consistency model as compared to sequential consistency and linearizability.
Both sequential consistency and linearizability also have causal consistency.
When is causal consistency the right choice?#
One of the most important questions in distributed systems is not whether a consistency model is stronger or weaker, but whether it provides the right guarantees for a specific application. Strong consistency models such as linearizability provide a highly intuitive view of data, but those guarantees often come with increased latency and reduced availability in geographically distributed systems.
Consistency model | Typical use case |
Linearizability | Banking, payments, inventory systems |
Sequential consistency | Shared state applications |
Causal consistency | Messaging, collaboration, social feeds |
Causal consistency is particularly valuable when users care about preserving cause-and-effect relationships but do not require every operation to be globally ordered. A messaging application is a good example. Users expect replies to appear after the messages that triggered them, but they generally do not care whether two unrelated conversations are displayed in the exact same order everywhere.
This trade-off allows distributed systems to remain more available and responsive while still presenting behavior that feels natural to users. For many collaborative and social applications, preserving causality is often sufficient, making causal consistency a practical middle ground between strict correctness guarantees and system performance.
Causality and Happened-Before Relation#
In a real system, inferring causality is hard. Though, finding potential causality or deciding if an event happened before another is possible. In that aspect, the term causal consistency is a misnomer. Because this term is commonly used, we will use it, but we will mean potential causality whenever we say causality.
How to Implement Causality#
Logical clocks are a de facto mechanism to capture potential causality. Leslie Lamport invented logical clocks in his
Lamport Clocks#
If we need to tag each event on a system of processes with a timestamp that is coherent with potential causality, then day-of-time or monotonic clocks (commonly available in a computing device) are not appropriate due to imperfections such as
Leslie Lamport invented Lamport clocks, which can be implemented on each participating process/node using a simple integer that will be tagged with each event (local event or network sends and receives).
A logical clock is associated with each process and it starts with an initial value (say, 0). As any event happens locally on a process, the clock value is incremented by 1 and assigned to the event. So the first rule is that:
The clock value of event
should be less than the clock value of event if event happens before event on the same process.
The processes can only communicate with each other via network messages. The reception of a message happens after the sending of the message. Therefore, when a process sends a network message, it assigns a local clock value to this sending event and sends it along with the message. On the reception of such a message, the receiving process gets the max of the local clock value and the clock value received in the incoming message. Then, the receiving process adds one to the max value and assigns that value to the received message event. So our second condition is that:
If process
sends a network message to process , then the clock value associated with the sending of the message should be less than the reception of the same message.
In the following illustration, we show three processes on three separate nodes that can only communicate via messages. The dots on the timeline represent the events happening in a process. The first part of the tag (such as
Whenever an event
happens before an event , the clock value associated with is always greater than event .
Those events that are not associated with any potential causality (meaning they are neither on the same process nor is there any connection due to a network message), such events are called concurrent events. For such cases we can’t say if event
Note: The clock value of event 1 on process R (1) is less than the clock value associated with event 3 on process P (3) but R1 still didn’t happen before P3.
If a storage service receives two write requests from two different processes, and , with the clock value 1 and 10, respectively, then can the service infer that 's event happened before 's event? For this reason, would the service decide to do ’s write first, then ’s?
Synchronizer Example From Lamport Paper#
The above questions' discussion might convince us that the plain Lamport clocks are not very useful by themselves. However, we will now see an example where we construct a distributed synchronizer where the processes send messages to all the participants so that clock value inference can be made.
Algorithm Sketch#
Let’s assume there are
Each process also maintains a queue of requests. A command with logical timestamp
Our algorithm needs active participation from all processes. That means failure of a single process or persistent network failure can halt the whole system. More sophisticated algorithms are needed to deal with failures, though, those algorithms are out of the scope of our current blog. We will refer interested readers to
Let’s see a simple example involving just two processes.
A Replicated Key-Value Store#
Let’s assume a key-value store is replicated at two remote sites. Process get and put requests on the key-value store. In the following illustration, both processes concurrently try to change the value of the same key
Note that in a system of
Note: Lamport clocks are further improved in vector clocks, where we are able to establish two-way implication on the happened-before relationship. Each process maintains a vector of clock values—one entry for each process—and appends that vector in each message. As the number of participants increases in the system, the length of the vector increases, and such data consumes network bandwidth. Further optimizations are possible in a variant of vector clocks called dotted vector clocks.
Shortcomings of Logical Clocks#
There are a few shortcomings in the logical clocks that programmers should be aware of.
If our system cannot record out-of-band events, it can not infer potential causality for them. For example, if we order food on an app, but then call the restaurant to make a change in our order, the system is unable to see that the action taken by a person at the restaurant happened after the actions taken during the original order placing.
Programmers need to devise a strategy to deal with the concurrent events.
Programmers also need to devise how to infer true causality from the happened-before relationship provided by logical clocks.
Recipe Sharing With Logical Clocks#
Let’s return to our original example of recipe sharing between friends and see how using logical clocks helps us out. Here we will use vector clocks. A vector like [0,0,0] means that the first index is Alice’s clock value, the second one is Sally’s, and the last one is Bob’s. Now each message will be tagged with such clock values. Additionally, each message should reach every participant (for example, dashed lines
Note that at tag
Conclusion#
Our real-world interactions are strongly connected with the time of occurrence and how we see one thing happening after the other, possibly in cause-effect relationships. Logical clocks are a simple but powerful way to capture such potential causality in a networked system of processes. Many data stores use the causal consistency model to provide an intuitive order of operations on data.
The logical clocks and associated state machines assumed a non-faulty system in this blog. To learn how we deal with failing nodes and networks, see the “State Machine Replication” chapter in our course Grokking the Principles and Practices of Advanced System Design. You can also check out our Complete Guide to System Design
Practice real company-specific system design questions to boost your skills and confidence: