Combiner and Partitioner
Explore how to implement combiners to aggregate intermediate map outputs and reduce network latency. Understand how partitioners assign keys to reducers for efficient data distribution in MapReduce. This lesson helps you optimize Big Data workflows by managing data flow during map and reduce phases.
We'll cover the following...
Combinator and Partitioner
In this lesson, we implement two optional features of MapReduce we discussed earlier.
Combiner
We can specify a class that acts on the output of a map task for each key. One of the reason to implement a combiner is to aggregate the intermediate map output. Then, during the shuffle process, the number of bytes transferred over the wire is reduced. Transferring data over a network introduces significant latency, and so the less data put on wire, the better.
In our mapper ...