Mapper

We’ll start by examining Hadoop’s Java classes to see how the abstract concept of map and reduce is translated in code. The Map phase of a MapReduce job is implemented by a Java class called Mapper. It maps input key/value pairs to a set of intermediate key/value pairs. Conceptually, a mapper performs parsing, projection (selecting fields of interest from the input) and filtering (removing non-interesting or malformed records). The Mapper class is defined as follows in the package org.apache.hadoop.mapreduce;

Mapper class

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        //...class body
    }

The Mapper class provided by Hadoop can be used to write our custom derived mapper class. Both the phases in a MapReduce have key/value pairs as input and output. If you look at the Mapper class you can see four generic parameters representing the inputs and outputs to the mapper class. The Mapper class defines a map(...) method that contains the mapping logic. This method is overridden in the user’s derived mapper class and contains custom logic for executing the map ...

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Mapper

Mapper

Mapper class