Search⌘ K
AI Features

Components of Kafka

Explore the fundamental components of Apache Kafka including messages, topics, partitions, brokers, producers, and consumers. Learn how these elements interact to enable scalable and reliable data pipeline construction using Kafka's architecture.

Let’s discuss the various components:

Message

A message is simply an array of bytes from Kafka’s perspective. It doesn’t assign any meaning or interpretation to the contents of the message. A message is also the unit of data in the Kafka ecosystem. It is akin to a row or record in a table in a relational database.

  • Messages are batched rather than being sent individually to reduce overhead. This leads to the classical tradeoff between latency and throughput. As batch sizes grow larger, throughput increases as more messages are handled per unit of time. At the same time, it takes longer for an individual message to be delivered thus increasing latency.
  • Messages batched together are compressed for efficient data transfer.

Topic

Messages get written and read from topics. A topic can be thought of as analogous to a directory or folder in an operating system.

Partition

A topic has subdivisions known as partitions. There are a number of salient points to remember about a topic:

  • Messages are ordered by time only within a partition and not across the entire topic.
  • Messages are read from beginning to end in a partition.
  • Message can only be appended to the end of a partition. (Notice the similarity to a commit log)
  • Partitions allow Kafka to scale horizontally and also provide redundancy. Each partition can be hosted on a different server, which allows new partitions to be added to a topic as the load on the system increases.

Message key

We mentioned earlier that a message gets written to a topic. Yet since partitions make up a topic, messages in reality get written to and read from partitions. We can control which partition a message lands in with the use of message keys. A message key is treated like a byte array by Kafka and is optional. Say there are five partitions within a topic. One scheme might be to compute modulo five of the numeric value represented by the key and direct the message to the partition number equal to the result of the operation. Thus, all messages with the same key are written to the same partition.

Message offset

Messages also have a metadata associated with them called the offset. The offset is an integer value that is ever increasing and determines the order of the message within a partition. By remembering the message offset, the consumer is able to continue from where it previously left off.

Schemas

Kafka doesn’t mandate that messages conform to any given schema. In fact, messages are opaque byte arrays from Kafka’s perspective. However, it is recommended that messages follow a structure and form that allows for easy understanding. Messages can be expressed in JSON, XML, Avro, or other formats. It is important to think about and mitigate issues that arise from schema changes, however. For example, message writers should only write messages in the new format/schema once the message readers have been updated with the new schema. Avro provides support for schema evolution and allows for backward and forward compatibility.

Brokers

A single Kafka server is called a broker. Usually, several Kafka brokers operate as one Kafka cluster. The cluster is controlled by one of the brokers, called the controller, which is responsible for administrative actions such as assigning partitions to other brokers and monitoring for failures. The controller is elected from the live members of the cluster.

A partition can be assigned to more than one broker, in which case the partition is replicated across the assigned brokers. This creates redundancy in case one of the brokers fails and allows another broker to take its place without disrupting access to the partition for the users. Within a cluster, a single broker owns a partition and is said to be the leader. All the other partition-replicating brokers are called followers. Every producer and consumer interacting with the partition must connect to the leader for that partition.

Messages in Kafka are stored durably for a configurable retention period. Messages can be stored for a certain number of days or up until the topic reaches a specific size in bytes, when the messages are expired and deleted.

A broker is responsible for receiving messages from producers and committing them to disk. Similarly, brokers also receive requests from readers and respond with messages fetched from partitions.

Producers

Producers create messages and are sometimes known as writers or publishers. Producers can direct messages to specific partitions using the message key and implement complex rules for partition assignment using a custom partitioner.

Consumers

Consumers read messages and are sometimes known as subscribers or readers. Consumers operate as a group called the consumer group. A consumer group consists of several consumers working together to read a topic. It is also possible for a consumer group to have a single consumer in it. Each partition is read by a single member of the group, though a single consumer can read multiple partitions. The mapping of a consumer to a partition is called the ownership of the partition by the consumer. If a consumer fails, the remaining consumers in the group will rebalance the partitions amongst themselves to make up for the failed member.