...

/

Running MapReduce End to End

Running MapReduce End to End

This lesson demonstrates running a MapReduce job in a cluster.

Running MapReduce End to End

We know how to run a MapReduce job using the code widget. In this lesson, we’ll learn to submit the job to a Hadoop cluster. For this purpose, we use a pseudo-distributed Hadoop cluster running in a docker environment.

Conceptually, the end-to-end MapReduce program works as follows:

The following code snippet lists the commands to execute that run a MapReduce job. Each command is explained later in the lesson. You may read the explanation first, and then execute the commands in the terminal. At the end of the lesson, a video shows the execution run of these commands.

Exercise

# Click on the terminal below and execute the commands in order
/DataJek/startHadoop.sh
jps
hdfs dfs -copyFromLocal /DataJek/cars.data /
hdfs dfs -ls /
hadoop jar JarDependencies/MapReduceJarDependencies/MapReduce-1.0-SNAPSHOT.jar io.datajek.mapreduce.Driver /cars.data /MyJobResult
hdfs dfs -ls /MyJobResult
hdfs dfs -text /MyJobResult/part-r-00000
Terminal 1
Terminal
Loading...

Explanation

  1. Copy and execute the following command in the terminal:

    /DataJek/startHadoop.sh
    
  2. Once the script finishes running, execute the following command:

    jps
    

    The jps command lists all the running Java processes. If you see the following six processes running, then the pesudo-distributed Hadoop cluster is working correctly:

    • Namenode
    • Datanode
    • NodeManager
    • ResourceManager
    • SecondaryNameNode
...