Elephants can remember: MapReduce Job History in HDP 2.0

An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.

In this blog article, we’ll explore how HDP 2.0 stores and provides insight into the performance of a MapReduce job on YARN.

Change from MapReduce v1 and HDP 1.x

In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists. The job life cycle management functionality is now the responsibility of the short-lived Application Masters. Each MapReduce-v2 job will spin up an Application Master, and after the MapReduce2 job is complete, the Application Master will be terminated.

For this reason, a new MapReduce JobHistory server was added for MapReduce-v2, which maintains information about MapReduce jobs after their Application Master terminates. The Resource Manager Web UI manages the forwarding of requests to the JobHistory server when the Application Master completes.

Viewing Job History in Ambari

With HDP 2.0, Ambari provides a screen to manage and monitor the JobHistory Server.

jobh

The JobHistory UI is accessible as a link from this screen. The JobHistory UI lists all executed MapReduce2 jobs.

jobh2

You can drill down into each job to get the detailed metrics about the job runtime.

jobh3

Job history data persisted to HDFS

All the underlying data per job is persisted to HDFS. This means that historical operational metrics for each job is maintained and is accessible for the lifetime of the HDP cluster.

In HDP 2.0, the MapReduce job history files are stored in the “/mr-history/done” directory on HDFS. The directories are organized by date the job executed on:

jobh4

Go Get It

Download HDP 2.0 Beta and deploy today!

Categorized by :
Administrator Ambari Architect & CIO Developer HDP 2 MapReduce YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.

Thank you for subscribing!