Elephants can remember: MapReduce Job History in HDP 2.0

An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.

In this blog article, we’ll explore how HDP 2.0 stores and provides insight into the performance of a MapReduce job on YARN.

Change from MapReduce v1 and HDP 1.x

In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists. The job life cycle management functionality is now the responsibility of the short-lived Application Masters. Each MapReduce-v2 job will spin up an Application Master, and after the MapReduce2 job is complete, the Application Master will be terminated.

For this reason, a new MapReduce JobHistory server was added for MapReduce-v2, which maintains information about MapReduce jobs after their Application Master terminates. The Resource Manager Web UI manages the forwarding of requests to the JobHistory server when the Application Master completes.

Viewing Job History in Ambari

With HDP 2.0, Ambari provides a screen to manage and monitor the JobHistory Server.

jobh

The JobHistory UI is accessible as a link from this screen. The JobHistory UI lists all executed MapReduce2 jobs.

jobh2

You can drill down into each job to get the detailed metrics about the job runtime.

jobh3

Job history data persisted to HDFS

All the underlying data per job is persisted to HDFS. This means that historical operational metrics for each job is maintained and is accessible for the lifetime of the HDP cluster.

In HDP 2.0, the MapReduce job history files are stored in the “/mr-history/done” directory on HDFS. The directories are organized by date the job executed on:

jobh4

Go Get It

Download HDP 2.0 Beta and deploy today!

Categorized by :
Administrator Ambari CIO & ITDM Developer HDP 2 MapReduce YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.