Transactions, Storage Layout, and other Guarantees
Explore how Kafka provides transactional guarantees and manages storage layout for distributed messaging. Learn about its two-phase commit protocol, exactly-once semantics, message ordering per partition, and performance optimizations to maintain durability and consistency.
We'll cover the following...
Transactional client
Kafka provides a transactional client that allows producers to produce messages to multiple partitions of a topic atomically.
A transactional client also makes it possible to commit consumer offsets from a source topic in Kafka and produces messages to a destination topic in Kafka atomically. This makes it possible to provide exactly-once guarantees for an end-to-end pipeline. This is achieved through the use of a two-phase commit protocol, where the brokers of the cluster play the role of the transaction coordinator in a highly available manner using the same underlying mechanisms for partitioning, leader election, and fault-tolerant replication.
The coordinator stores the status of a transaction in a separate log. The messages contained in a transaction are stored in their own partitions as usual.
When a transaction is committed, the coordinator is responsible for writing a commit marker to the partitions containing messages of the transactions and the partitions storing the consumer offsets.
Consumers can also specify the isolation level they want to read under, read_committed or read_uncommitted. In the former case, messages that are part of a transaction will be readable from a partition only after a commit marker has been produced for the associated transaction. This interaction is summarised in the following illustration:
Physical storage of Kafka
The physical storage layout of Kafka is simple and it is shown in the following illustration. Every log partition is implemented as a set of segment files of approximately the same size (e.g., 1 GB).
Every time a producer publishes a message to a partition, the broker appends the message to the last segment file. For better ...