MapReduce Execution in Hadoop
In this article we have tried to summaries, how a MapReduce program executes in Hadoop environment.
MapReduce 1 Execution Sequence:
MapReduce execution starts with below command.
Step 1: $ hadoop jar <jar> [mainClass] args...
This command starts the MapReduce execution in Clients JVM. creates the Job.
Step 2: JobTracker.getNewJobId()
Client Asks JobTracker for a new JobId.
Step 3: JobClient.SubmitJob() / JobClient.runJob()
- Checking the input and output specifications of the job. Computing the InputSplits for the job. Setup the requisite accounting information for the DistributedCache of the job, if necessary.
- Copying the job’s jar and configuration to the JobTracker file-system, in a folder names as the JobId assigned with very high replication factor(default 10).
- Submitting the job to the JobTracker and optionally monitoring it’s status.
- JobTracker puts the job into an internal queue from where JobScheduler will pick it up and Initialize
Step 4: Initialize the Job
- Creates an object of the Job. Encapsulates its Tasks.
- Retrieve the InputSplit and create one map task for each split.
- Creates the reduces task the number of reduce task depends on the number defined in the driver code(default 1).
- Creates a Job Setup task to setup the job before map tasks run.
- Creates a Job Cleanup task to run after the reducer task run.