What is stream processing with Apache Kafka?
A client library for processing and analyzing data stored in Kafka is called Kafka streams. It expands on crucial stream processing ideas such as separating event time from process time, which allows processing in a specific timespan, and managing the real-time querying of the application state.
Stream processing applications use the Kafka stream library for continuously and instantly processing real-time data as it is created or acquired.
Let's look at the illustration below to understand this better:
In the illustration above, one user worked like a producer and updated a file, just like a post on a social media website. The other two consumers consumed it, similar to a user reacting on the social media platform. Doing this is only possible for the consumers because when the producer produces the file, it gets processed immediately with the help of stream processing to make it consumable for the consumers.
Let's look at a program example to understand this better:
from kafka import KafkaConsumer
import json
import time
import random
bootstrap_servers = "0.0.0.0:9092"
topic = "educative"
auto_offset_reset = 'earliest'
consumer = KafkaConsumer(topic, bootstrap_servers=bootstrap_servers,
auto_offset_reset=auto_offset_reset,
value_deserializer=lambda m: json.loads(m.decode('ascii')))
print("Consumer Started")
try:
for message in consumer:
print("Consumer received message = ", message)
time.sleep(random.randint(2, 4))
except KeyboardInterrupt:
print("Aborted by user...")
finally:
consumer.close()
Explanation
In the producer file:
Lines 7–9: We use the queue to store the messages.
Lines 18–20: We pick the message from the queue and send it.
In the consumer and consumer1 files:
Lines 10–12: We deserialize a JSON file we received from the producer.
Free Resources