Introduction to MapReduce

Let's find the inspiration behind MapReduce. Also, look into the goal and the working of MapReduce.

MapReduce is a framework for batch data processing originally developed internally in Google by Dean et al… It was later incorporated in the wider Apache Hadoop framework.

Inspiration

The framework draws inspiration from the field of functional programming and is based on the following main idea:

Idea

Many real-word computations can be expressed with the use of two main primitive functions, map and reduce.

Map

The map function processes a set of key-value pairs and produces another set of intermediate key-value pairs as output.

Reduce

The reduce function receives all the values for each key and returns a single value, essentially merging all the values according to some logic

An important property of map and reduce functions

The map and reduce functions can easily be parallelized and run across multiple machines for different parts of the dataset. As a result,

  • The application code is responsible for defining these two methods.
  • The framework is responsible for partitioning the data, scheduling the program’s execution across multiple nodes, handling node failures, and managing the required inter-machine communication.

Get hands-on with 1200+ tech skills courses.