More on The Architecture of Kafka

In this lesson, we'll continue learning about the architecture of Kafka.

Commit #

For each consumer, Kafka stores the offset for each partition. This offset indicates which record in the partition the consumer read and processed last. It helps Kafka to ensure that each record is eventually handled.

When consumers have processed a record, they commit a new offset. In this way, Kafka knows at all times which records have been processed by which consumer and which records still have to be processed. Of course, consumers can commit records before they are actually processed. As a result, records that never get processed is a possibility.

The commit is on an offset, for example, “all records up to record 10 in this partition have been processed.” A consumer can commit a batch of records, which results in better performance because fewer commits are required.

But then duplicates can occur. This happens when the consumer fails after processing a part of a batch and has not yet committed the entire batch. At restart, the application would read the complete batch again, because Kafka restarts at the last committed record and thus at the beginning of the batch.

Kafka also supports exactly once semantics that is, a guaranteed one-time delivery.

Get hands-on with 1200+ tech skills courses.