Search⌘ K

Map Reduce Framework

Explore the Map Reduce programming model that enables processing large data in parallel across distributed systems. Understand its map and reduce tasks, how key-value pairs are generated and aggregated, and its suitability for batch processing rather than real-time or streaming data.

Map Reduce

This is a programming model introduced by Google. It is part of the Hadoop Ecosystem. It enables us to process large datasets in a distributed environment in a distributed and parallel manner.

Map Reduce working
Map Reduce working

Map Reduce consists of two tasks, Map and Reduce as it is also visible from the above diagram. Reducer is run after the Map operation has run. Map operations take in the input, and apply the logic. Then, they produce the output in the form of (Key,Value)(Key, Value) pairs.

Next, the reducer receives the (Key,Value)(Key, Value) pairs from multiple Map Jobs, as it is also visible from the above diagram. The responsibility of Reducer is to aggregate those intermediate results produced by Mapper functions, and then produce the final output.

Word count example

Map Reduce word count example
Map Reduce word count example

In ...