MapReduce's Manager-Worker Architecture

Let's study the architecture of MapReduce and the guarantees provided by it.

The manager node is responsible for scheduling tasks for worker nodes and managing their execution, as shown in the following illustration:

Press + to interact
Architecture of MapReduce
Architecture of MapReduce

Apart from the definition of the map and reduce functions, the user can also specify the number M of map tasks, the number R of reduce tasks. MapReduce can also specify the number of input or output files, and a partitioning function that defines how key-value pairs from the map tasks are partitioned before being processed by the reduce tasks. By default, a hash partitioner is used that selects a reduce task using the formula hash(key) mod R.

Steps for the execution of MapReduce

The execution of MapReduce proceeds in the ...

Get hands-on with 1400+ tech skills courses.