Elephants can remember: MapReduce Job History in HDP 2.0
An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.
Change from MapReduce v1 and HDP 1.x
In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists. The job life cycle management functionality is now the responsibility of the short-lived Application Masters. Each MapReduce-v2 job will spin up an Application Master, and after the MapReduce2 job is complete, the Application Master will be terminated.
For this reason, a new MapReduce JobHistory server was added for MapReduce-v2, which maintains information about MapReduce jobs after their Application Master terminates. The Resource Manager Web UI manages the forwarding of requests to the JobHistory server when the Application Master completes.
Viewing Job History in Ambari
With HDP 2.0, Ambari provides a screen to manage and monitor the JobHistory Server.
The JobHistory UI is accessible as a link from this screen. The JobHistory UI lists all executed MapReduce2 jobs.
You can drill down into each job to get the detailed metrics about the job runtime.
Job history data persisted to HDFS
All the underlying data per job is persisted to HDFS. This means that historical operational metrics for each job is maintained and is accessible for the lifetime of the HDP cluster.
In HDP 2.0, the MapReduce job history files are stored in the “/mr-history/done” directory on HDFS. The directories are organized by date the job executed on:
Go Get It
Download HDP 2.0 Beta and deploy today!