Introduction to Streaming

Learn what is streaming and the difference between real-time processing and batch processing.

Overview

Streaming data is generated and delivered continuously, in a never-ending manner, and at a variable rate. It can be of two types: inbound or outbound.

The inbound streaming data could arrive so fast and be so massive in volume that it is futile, unworthy, or infeasible to store them. It is almost impossible to regulate structure, data integrity, or control the volume and velocity of the data generated.

That means our application needs to extract knowledge from the data as soon as it arrives. In other words, speed matters the most in big data streaming, because the value of data decreases with time if not processed quickly.

Stream processing

Stream processing is a technique used to process and analyze data in motion.

As the stream of data is potentially infinite and of any size, we cannot be sure whether it will fit the amount of memory we have. To address this issue, we use a sliding window of time technique. We always look at the data that arrived in the last N seconds or so.

Of course, we need to take precautions to ensure we do a first level of processing quickly. Usability suffers if our initial processing is slow enough that it misses 80% of the data in a sliding window. If we fail to meet this, we need to make sure we can catch up using caching techniques.

Get hands-on with 1200+ tech skills courses.