Basics

Explore the basics of MapReduce, a distributed computing model that processes big data using map and reduce phases. Understand how it divides tasks across clusters for scalable, fault-tolerant batch processing. Gain insight into its key-value input-output model and practical applications in big data analysis.

We'll cover the following...

Map and Reduce
Characteristics
Input and output
Map phase
Reduce phase
Example

Map and Reduce

MapReduce is a concatenation of, “map” and “reduce” which aptly describes the two phases it comprises. MapReduce is an implementation of the computing model introduced by Google. Here, data-parallel computations are executed on clusters of unreliable machines by certain systems. These systems automatically provide locality-aware scheduling, fault tolerance, and load balancing. In simpler terms, think of MapReduce similar as a divide and conquer strategy. A huge data set is divided among worker machines. Once processing is complete, the data from each machine is aggregated to present a final solution. The data flow in various phases of a MapReduce job is shown below.

1.Hadoop

2.YARN

3.Map Reduce

4.HDFS

5.Spark

6.Input & Output Formats

7.Misc

8.Quiz

9.Reference: Replication

10.Reference: Partitioning

11.Reference: Transactions

12.Reference: Issues in Distributed Systems

Mock Interview

Basics

Map and Reduce