This lesson describes the Mapper function used to implement the Map phase.


We’ll start by examining Hadoop’s Java classes to see how the abstract concept of map and reduce is translated in code. The Map phase of a MapReduce job is implemented by a Java class called Mapper. It maps input key/value pairs to a set of intermediate key/value pairs. Conceptually, a mapper performs parsing, projection (selecting fields of interest from the input) and filtering (removing non-interesting or malformed records). The Mapper class is defined as follows in the package org.apache.hadoop.mapreduce;

Mapper class

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        //...class body

The Mapper class provided by Hadoop can be used to write our custom derived mapper class. Both the phases in a MapReduce have key/value pairs as input and output. If you look at the Mapper class you can see four generic parameters representing the inputs and outputs to the mapper class. The Mapper class defines a map(...) method that contains the mapping logic. This method is overridden in the user’s derived mapper class and contains custom logic for executing the map phase of the user’s job.

At times, having an input key for the map phase may not make sense. In our car counting example, we only care about seeing strings representing the car made in the input file. The key received by the map(...) method will be a long value representing the offset of the beginning of the line from the start of the file. The logic of our mapper funcion will be trivial. Whenever we see a brand name like Toyota , we output the name as the key and a count of 1. This indicate that we came across one car of that particular brand. Similarly, whenever we see a string Mercedes, we output the text Mercedes as the key and a count of 1 to document that we saw one car of Mercedes make, so on and so forth. Our mapper class looks like this:

Get hands-on with 1200+ tech skills courses.