MapReduce Execution in Hadoop

In this article we have tried to summaries, how a MapReduce program executes in Hadoop environment.

MapReduce 1 Execution Sequence:

MapReduce execution starts with below command.

Step 1: $ hadoop jar <jar> [mainClass] args...

This command starts the MapReduce execution in Clients JVM. creates the Job.

Step 2: JobTracker.getNewJobId()

Client Asks JobTracker for a new JobId.

Step 3: JobClient.SubmitJob() / JobClient.runJob()

Checking the input and output specifications of the job. Computing the InputSplits for the job. Setup the requisite accounting information for the DistributedCache of the job, if necessary.
Copying the job’s jar and configuration to the JobTracker file-system, in a folder names as the JobId assigned with very high replication factor(default 10).
Submitting the job to the JobTracker and optionally monitoring it’s status.
JobTracker puts the job into an internal queue from where JobScheduler will pick it up and Initialize

Step 4: Initialize the Job

Creates an object of the Job. Encapsulates its Tasks.
Retrieve the InputSplit and create one map task for each split.
Creates the reduces task the number of reduce task depends on the number defined in the driver code(default 1).
Creates a Job Setup task to setup the job before map tasks run.
Creates a Job Cleanup task to run after the reducer task run.

mr1_execution