A MapReduce Example

Get familiar with MapReduce using a practical example.

MapReduce in practice

To understand MapReduce, we first need to understand its importance. Suppose we have a vast dataset of the text files. Each file has multiple lines and is separated into numerous nodes.

For this lesson, we will continue with the tennis ball example discussed in an earlier lesson.

Due to lack of infrastructure, in our example, we will do the processing in one node. However, this will make no difference in our interaction, at least from a development perspective. Infrastructure administration has some overhead, of course.

First, we store information about our products in various files in HDFS.

Those files contain the name of the color of each tennis ball that matches the criteria.

Note: As explained in the previous lesson, this might not be an example worth solving with Hadoop in real life, but it is helpful to understand the complete workflow.

So we have an input file where the first 5 lines look like this:

green
green
pink
red

Let’s say we have a requirement to make sure the number of tennis balls produced is less than a certain threshold, as dictated by the wise leaders of Tennis-iso (an imaginary organization) and the sales expectations of the factory.

So we will create a Hadoop job to count the tennis balls.

Example

To help you, we have provided the files in the widget below. You can follow the steps listed below to run this MapReduce job.

Click on the “Run” button to execute the code. Please wait for a few minutes as the Hadoop cluster takes a while to be created. In the meantime, we can continue with our lesson to learn how to execute the example.

Get hands-on with 1200+ tech skills courses.