MapReduce in practice

To understand MapReduce, we first need to understand its importance. Suppose we have a vast dataset of the text files. Each file has multiple lines and is separated into numerous nodes.

For this lesson, we will continue with the tennis ball example discussed in an earlier lesson.

Due to lack of infrastructure, in our example, we will do the processing in one node. However, this will make no difference in our interaction, at least from a development perspective. Infrastructure administration has some overhead, of course.

First, we store information about our products in various files in HDFS.

Those files contain the name of the color of each tennis ball that matches the criteria.

Note: ...

Before We Begin

Setting The Stage

The Hadoop Ecosystem

Streaming

Apache Spark

Conclusion

A MapReduce Example

MapReduce in practice