Search⌘ K
AI Features

Distributed Coordination in Kafka

Explore Kafka's approach to distributed coordination, focusing on how consumers manage partition ownership and load balancing using ZooKeeper. Learn how Kafka achieves scalability and fault tolerance by managing consumer groups without a central manager, handling failovers, and executing rebalance processes to ensure efficient real-time data streaming.

There are three ways a producer can publish messages to partitions. It can either publish a message to a particular partition, a random partition, or by selecting a partition by applying a partition function on a message key. In this lesson, we’ll see how the consumer will interact with multiple brokers in parallel.

Coordination in consumer groups

Certain features are there in Kafka to help achieve scalability goals. Those features are explained below.

Goals of coordination

  1. Evenly distribute the messages (stored in the brokers) across multiple consumers.

  2. Concede the lowest possible coordination overhead.

Consumption of messages

Partitions are made to be the smallest unit of parallelism in Kafka. Each smallest unit of parallelism needs to be consumed only by a single consumer. This means only one consumer consumes all the messages in a specific partition of a topic.

If multiple consumers were allowed to consume a partition, they would have had to communicate and decide which messages would be consumed by whom, incurring a locking and state maintenance overhead.

Communication between consumers while consumption of messages
Communication between consumers while consumption of messages

In Kafka, consumers need to coordinate only when a consumer fails, and all the other consumers need to rebalance the load. To ...