Stream Processing
Learn about stream processing and its role in handling unbounded data flows in large scale systems. Understand how stream processing reduces delays in data analysis compared to batch methods and explore real-life mechanisms like Apache Kafka and Apache Flink. Gain insight into processing data instantly or in small batches based on business needs.
In the word counting example for the MapReduce algorithm, we noticed that the entire text is stored somewhere, and the mapper machines loaded chunks of the data and processed the chunks. The data is more or less organized in batches. The processing system loads the batches and does the job. Eventually, the system produces some form of output data.
Now, the input is somewhat bounded. We assumed that we had all the text of the English literature. All we had to do is to run the MapReduce algorithm on top of the data and gather results.
Now the important question.
What if the data is unbounded?
How to handle unbounded data
In a real life system where we need data processing, data is almost always unbounded. Let’s quickly discuss an example.
Assume the engineers at Instagram decided to analyze user behaviors on Instagram videos. They will want to track user behavior while using the app to understand how users use the app or interact with some content. The ...