Stateless Processing Introduction

Get introduced to the stateless topology we will be building.

In this chapter, we’ll learn how to choose and use stateless operators. We will do so by designing, building, and running a complete end-to-end Kafka Streams stateless topology. To refresh our memory, when we talk about stateless operators, we refer to stream processors, which do not require knowledge of past records, i.e., each record is processed independently of previous records. In particular, we will be using the following Kafka Stream’s high-level domain-specific language (DSL) stateless operators:

  • filter: This is an operator used to select which records should be further processed.

  • map: This is used to transform the value of the incoming record to a different value (with possibly a new type).

  • flatMap: This is similar to map, but transforming a single input record to a single output record transforms a single input record to zero, one, or many records.

  • split: This splits a stream into multiple streams.

  • merge: The opposite of split merges two streams into a single stream.

We will also explore the important topics of configuration, serialization, and deserialization.

Introducing our stateless topology project

It is well known that music has a great effect on human feelings. In our stateless topology project, we will create a fictional application that analyzes and monitors its users’ feelings and well-being based on the music they listen to. We will start by defining the application’s requirements and then translating these requirements to a Kafka Streams processor topology. We’ll then build the topology together, step-by-step (or, more correctly, processor-by-processor).

Requirements

  • Our users are listening to music using a music-streaming service. Each time a track is listened to, its metadata is published to a Kafka topic in JSON format.

  • Tracks that have been listened to for less than 30 seconds should be disregarded.

  • Music can be instrumental (without lyrics) or noninstrumental.

  • We have at our disposal three APIs:

    • An API to get the lyrics of a track by its ID.

    • An API to get a list of feelings associated with a track by its ID based on the analysis of the music itself.

    • An API to get a list of feelings associated with a text.

  • Our data scientists find that if a song has lyrics, they are more important than the music for the feelings analysis.

  • The results should be sent to another Kafka topic for downstream analysis. Each record should be a JSON containing a single feeling value.

Processor topology

Using the high-level DSL operators presented at the beginning of this lesson, let’s try and design our topology:

Get hands-on with 1200+ tech skills courses.