Detailed Design of Kafka

Understand the detailed design of each component in Kafka.

Kafka is a messaging system that includes a sender that publishes messages (data) not specifically to a receiver but assigns a type to them and a receiver that subscribes to a certain type of messages. Kafka also has a broker that facilitates both the sender and receiver publishing and subscribing to messages.

Kafka’s Architecture

Kafka is composed of three main components, which are described as follows.

Producers

A producer is responsible for creating new messages. They contain the following components:

  • Producer record: A record is kept for each message created in the producer record. It is composed of the topic that’s supposed to get the message, a key or partition, and a value.

  • Serializer: The first thing a producer does on a message is that it converts the key and value of a message into byte arrays (this process is sometimes called the marshaling of data).

  • Partitioner: After being serialized, the message goes to the partitioner, which returns the partitions of a specific topic to which the message should be assigned to. Following are a few ways a partitioner can operate on messages:

    • If we specify a key in the producer record, the partitioner uses a hash function on the message key and maps it to a specific partition.

    • If we specify a partition in the producer record, the partitioner doesn't do anything, and the message is assigned to that specific partition.

By default, we get one partition for each topic. However, the number of partitions is a parameter that can be altered by the user (each partition gets its own ID). A good practice is to select the number of partitions equal to or a multiple of the number of brokers in the cluster. This allows equal distribution of partitions to the brokers.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.