Apache Kafka Architecture

Learn about the high-level architecture of Apache Kafka and its open-source ecosystem, and perform basic operations using the Kafka CLI.

Apache Kafka: key components

Apache Kafka is a distributed system and has multiple components that coordinate and work together.

Press + to interact
Apache Kafka: high-level architecture
Apache Kafka: high-level architecture

Here are some key components of Kafka.

Brokers

A broker, also known as a node or server, is a fundamental building block that runs the main Kafka process. A production-grade deployment requires three or more brokers to achieve high availability and scalability. Brokers are clustered together, and each cluster has a leader that can have one or more follower nodes that replicate data from it.

Client application

Producers send messages to Kafka, which are composed of a key and a value (with the key potentially being null). Meanwhile, consumer applications subscribe to retrieve and handle these events. Producers and consumers operate independently and without knowledge of one another. Consequently, producers can write events to Kafka without awareness of who consumes them, while consumers can read them from Kafka without knowing who produced them.

Topic

Messages sent by producers are stored and organized into topics, with each message being appended to the end of the topic (similar to that of the commit log). A topic can have multiple producers that send events to it, as well as numerous consumers that subscribe to these events. Unlike conventional messaging systems, events in a topic can be read multiple times because they are not deleted after being consumed. Instead, a per-topic configuration determines how long Kafka should store the events, after which older events are discarded.

Partition

Each topic comprises one or more partitions, with each message assigned to a specific partition via a hash function applied to the key (in cases where the key is null, a round-robin approach is adopted instead). The Kafka cluster replicates data across all of these partitions. This distributed placement of data is critical for scalability, because it enables client applications to read from and write to the data across multiple brokers simultaneously.

The Apache Kafka ecosystem

Kafka is not just a stand-alone piece of software. One of the key reasons behind its popularity and adoption is its rich ecosystem of open-source libraries and open-source tools, with some being part of the core Kafka project (for example, Kafka Connect), while others are independent projects that integrate with and are designed to work on top of Kafka (for example, Strimzi).

Here is an overview of the open-source components that are part of the core Kafka project. Most of these will be covered in this course.

  • Kafka Client APIs: These are the Producer and Consumer APIs that allow client applications to write and read data from Kafka topics, respectively. These APIs are a very important part of the overall ecosystem since other components (described in this section) rely on and are built on top of these APIs. Kafka also has client libraries in multiple programming languages, but the Java client is part of the core Kafka project.

  • Kafka Connect: This provides a high-level framework for building connectors that integrate Kafka with other systems such as databases, object storage, etc. Source connectors move data from source systems to Kafka topics, and sink connectors move data from Kafka topics into target systems. The JDBC connector and Debezium are examples of widely used connectors.

  • Kafka Streams: This is a Java library that allows us to execute stateless and stateful computations (map, filter, join, aggregations, etc.) on streaming data flowing in and out of Kafka topics. It provides high-level APIs (DSL and Processor) that we can use to create topologies to execute these computations.

  • Kafka MirrorMaker: This is a tool for copying data between two Apache Kafka clusters. Data is read from topics in the source cluster and written to a topic with the same name in the destination cluster.

Getting started with Kafka using the CLI

The Apache Kafka distribution includes a handy CLI. Let’s conclude this introductory lesson with a short tutorial to reinforce some of the core Kafka concepts, like the producer, consumer, and topic.

To start the Apache Kafka broker in the terminal below, click “Click to Connect....” Once that’s done, follow the steps outlined below:

Terminal 1
Terminal
Loading...

Creating a topic

Click the “+” button to open a new terminal tab and enter the following commands:

cd /app/confluent-7.3.1/bin
./kafka-topics --bootstrap-server localhost:9092 --create --partitions 2 --replication-factor 1 --topic test
Creating a Kafka topic

Once the topic is created, you should see the Created topic test confirmation message.

Starting the consumer CLI

Click the “+” button to open a new terminal tab and enter the following commands:

cd /app/confluent-7.3.1/bin
./kafka-console-consumer --bootstrap-server localhost:9092 --topic test
Using the Kafka consumer CLI

Invoking the producer CLI

Click the “+” button to open a new terminal tab and enter the following commands:

cd /app/confluent-7.3.1/bin
./kafka-console-producer --bootstrap-server localhost:9092 --topic test
Using the Kafka producer CLI

You should see a > prompt when the producer is ready. Enter the messages you want to send to the topic (press the “Enter” key after each message).

Go back to the consumer terminal to verify that the messages were received.

Conclusion

In this lesson, we covered the high-level architecture of Apache Kafka, along with its client ecosystem. We also used the Kafka CLI to try out basic operations like topic creation and producing and consuming messages from a topic.