Search⌘ K
AI Features

Mapper

Explore how the Hadoop Mapper class implements the map phase in MapReduce by processing input data into key-value pairs. Learn the role of parsing, projection, and filtering in your mapper code, and understand how output is managed with buffering, spill files, and partitioning. This lesson helps you grasp the basics of writing custom mappers and how intermediate data is handled before reduction.

Mapper

We’ll start by examining Hadoop’s Java classes to see how the abstract concept of map and reduce is translated in code. The Map phase of a MapReduce job is implemented by a Java class called Mapper. It maps input key/value pairs to a set of intermediate key/value pairs. Conceptually, a mapper performs parsing, projection (selecting fields of interest from the input) and filtering (removing non-interesting or malformed records). The Mapper class is defined as follows in the package org.apache.hadoop.mapreduce;

Mapper class

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        //...class body
    }

The Mapper class provided by Hadoop can be used to write our custom derived mapper class. Both the phases in a MapReduce have key/value pairs as input and output. If you look at the Mapper class you can see four generic parameters representing the inputs and outputs to the mapper class. The Mapper class defines a map(...) method that contains the mapping logic. This method is overridden in the user’s derived mapper class and contains custom logic for executing the map ...