MapReduce's Manager-Worker Architecture
Let's study the architecture of MapReduce and the guarantees provided by it.
The manager node is responsible for scheduling tasks for worker nodes and managing their execution, as shown in the following illustration:
Apart from the definition of the map
and reduce
functions, the user can also specify the number M
of map
tasks, the number R
of reduce
tasks. MapReduce can also specify the number of input or output files, and a partitioning function that defines how key-value pairs from the map
tasks are partitioned before being processed by the reduce
tasks. By default, a hash partitioner is used that selects a reduce
task using the formula hash(key) mod R
.
Steps for the execution of MapReduce
The execution of MapReduce proceeds in the ...
Get hands-on with 1400+ tech skills courses.