Understanding NameNode Startup Operations in HDFS

Before I was a developer of Hadoop, I was a user of Hadoop.  I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.

Have you ever wondered what’s happening during a namenode restart?  A new feature coming in HDP 2.0 will give operators greater visibility into this critical process.  This is a feature that would have been very useful to me in my prior role.

Motivation

During startup, the NameNode must complete certain actions before it can serve client requests:

  1. Read file system metadata from the fsimage file.
  2. Read edit logs and apply logged operations to the file system metadata.
  3. Write a new checkpoint (a new fsimage consisting of the prior fsimage plus the application of all operations from the edit logs).
  4. Remain in safe mode until a sufficient number of blocks have been reported by datanodes.

In some situations, these actions can take a long time to complete.  For example:

  • If the edit logs have grown very large, then reading all of the operations and applying them to the metadata will take a long time.  This can occur if there has not been a recent checkpoint, such as due to a long-term outage of the secondary namenode.
  • A degraded disk can slow down performance in any of the I/O-bound steps: reading fsimage, reading edit logs, or writing a new checkpoint.
  • When writing a new checkpoint, the NameNode writes to every one of the redundant locations configured for storing fsimage.  Even though these writes occur in parallel, the NameNode blocks until the writes have completed to every location before it allows client connections.  This means a single slow disk inhibits overall startup performance.

Administrators typically access the NameNode web UI at the first sign of trouble.  Unfortunately, the NameNode wouldn’t start its HTTP server until after writing a new checkpoint.  In a slow startup situation, it could take multiple minutes or even more than an hour after restarting the NameNode before the web UI would be accessible.  It would appear as though the NameNode process had hung during startup.  Only an experienced Hadoop operator would be able to determine that the NameNode is in fact making progress, by using relatively low-level techniques such as inspecting thread dumps.

Solution

After fielding several support calls related to this, Hortonworks engineers filed Apache JIRA HDFS-4249 for a new feature to track details about NameNode startup progress and make the information available to end users.  The implementation of this feature also moved startup of the NameNode’s HTTP server much earlier in the startup sequence, so users can access the web UI to observe startup progress immediately after restarting the process.

Here we see a typical display from the NameNode web UI during startup, showing completion and progress of the various steps executed during startup.  In this case, the NameNode has completed loading fsimage and is 70% done loading edits.  After that, it will proceed to saving a new checkpoint and entering safe mode.  Completed phases are displayed in bold text.  The currently running phase is displayed in italics.  Phases that have not yet begun are displayed in gray text.

The UI also shows key information, such as where the fsimage/edits was loaded from, the size of those files, and the number of inodes loaded during startup. During checkpointing all the locations where the checkpointed image is written to, and the size of the image is also shown.

We decided to expose the same data in a machine-readable JSON format accessible via the new/startupProgress URL.  This enables future integration with Hadoop management tools such as Apache Ambari for a richer user experience than can be provided in the NameNode web UI.

We also chose to publish this information via JMX for ease of integration with a wide variety of enterprise monitoring tools.  Here we can see jconsole inspecting the new counters.

Conclusion

The HDFS team at Hortonworks constantly searches for new ways to make it easier to manage distributed storage.  This new feature greatly increases visibility into the NameNode’s startup sequence for operators who manage Hadoop clusters.  We hope you find this feature as useful in your own clusters as we do in ours.

Categorized by :
HDFS HDP 2

Comments

David Goyal
|
February 7, 2014 at 7:02 pm
|

Is this feature accessible from HDP 2.0 Windows install?

    |
    February 7, 2014 at 10:28 pm
    |

    Hi, David. Yes, this feature is available across all supported platforms, including Windows.

|
August 22, 2013 at 5:57 am
|

Thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.