Introduction to MapReduce

Let's find the inspiration behind MapReduce. Also, look into the goal and the working of MapReduce.

MapReduce is a framework for batch data processing originally developed internally in Google by Dean et al… It was later incorporated in the wider Apache Hadoop framework.


The framework draws inspiration from the field of functional programming and is based on the following main idea:


Many real-word computations can be expressed with the use of two main primitive functions, map and reduce.


The map function processes a set of key-value pairs and produces another set of intermediate key-value pairs as output.


The reduce function receives all the values for each key and returns a single value, essentially merging all the values according to some logic

An important property of map and reduce functions

The map and reduce functions can easily be parallelized and run across multiple machines for different parts of the dataset. As a result,

  • The application code is responsible for defining these two methods.
  • The framework is responsible for partitioning the data, scheduling the program’s execution across multiple nodes, handling node failures, and managing the required inter-machine communication.

Get hands-on with 1200+ tech skills courses.