...

>

Unique IDs with Causality

Unique IDs with Causality

Design a distributed sequencer capable of generating unique IDs while preserving event causality. Compare physical clock solutions like Twitter Snowflake and Google TrueTime with logical clocks. Evaluate the trade-offs between consistency, scalability, and complexity in System Design.

Causality

In the previous lesson, we generated unique IDs. Beyond uniqueness, systems often need to reason about event ordering and causality. Consider two users on a social media platform. One user posts a comment (Event A), and another user replies (Event B). Event B depends on Event A and must occur after it. These two events are causally related.

Conversely, concurrent events occur independently. If Peter and John comment on two different Tweets simultaneously, there is no causal relationship between them. We need to identify dependencies where they exist.

Note: Causality can be modeled by encoding dependencies in a graph structure or by maintaining a separate logical time mechanism. In many designs, it is preferable to use an identifier that both guarantees uniqueness and preserves event ordering.

The following slides visualize concurrent and non-concurrent events.

canvasAnimation-image
1 / 5

Some applications require identifiers to encode ordering or causality information. For example, a key-value store can use time-ordered identifiers to implement a “last-write-wins” policy for resolving concurrent writes.

We can use logical or physical clocks to infer causality. Some systems, like financial applications under MiFID regulations, require strict wall-clock synchronization (within 100 microseconds of UTC) to detect anomalies in high-speed trading.

Note: Logical and physical clocks involve several non-trivial design trade-offs. Review the section titled “Time in a Distributed System” for a recap of these concepts.

Timestamps are commonly used to determine event ordering. For example, if one event occurs at 6:00 a.m. and another at 7:00 a.m., their timestamps establish the order. However, relying on physical clocks in distributed systems is challenging due to clock drift, skew, and synchronization limits.

Use UNIX time stamps

UNIX timestamps offer millisecond granularity. An ID-generating server producing one ID per millisecond yields 1,000 identifiers per second. This results in 24 (hour)×60 (min/hour)×60 (sec/min)×1000 (ID/sec)=86,400,000 IDs24 \ (\text{hour}) \times 60 \ (\text{min/hour}) \times 60 \ (\text{sec/min}) \times 1000 \ (\text{ID/sec}) = 86{,}400{,}000 \ \text{IDs} in a day, less than a billion per day.

Note: Connect to the following terminal to view the ...