Understanding NameNode Startup Operations in HDFS
Before I was a developer of Hadoop, I was a user of Hadoop. I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.
Have you ever wondered what’s happening during a namenode restart? A new feature coming in HDP 2.0 will give operators greater visibility into this critical process. This is a feature that would have been very useful to me in my prior role.
During startup, the NameNode must complete certain actions before it can serve client requests:
- Read file system metadata from the fsimage file.
- Read edit logs and apply logged operations to the file system metadata.
- Write a new checkpoint (a new fsimage consisting of the prior fsimage plus the application of all operations from the edit logs).
- Remain in safe mode until a sufficient number of blocks have been reported by datanodes.
In some situations, these actions can take a long time to complete. For example:
- If the edit logs have grown very large, then reading all of the operations and applying them to the metadata will take a long time. This can occur if there has not been a recent checkpoint, such as due to a long-term outage of the secondary namenode.
- A degraded disk can slow down performance in any of the I/O-bound steps: reading fsimage, reading edit logs, or writing a new checkpoint.
- When writing a new checkpoint, the NameNode writes to every one of the redundant locations configured for storing fsimage. Even though these writes occur in parallel, the NameNode blocks until the writes have completed to every location before it allows client connections. This means a single slow disk inhibits overall startup performance.
Administrators typically access the NameNode web UI at the first sign of trouble. Unfortunately, the NameNode wouldn’t start its HTTP server until after writing a new checkpoint. In a slow startup situation, it could take multiple minutes or even more than an hour after restarting the NameNode before the web UI would be accessible. It would appear as though the NameNode process had hung during startup. Only an experienced Hadoop operator would be able to determine that the NameNode is in fact making progress, by using relatively low-level techniques such as inspecting thread dumps.
After fielding several support calls related to this, Hortonworks engineers filed Apache JIRA HDFS-4249 for a new feature to track details about NameNode startup progress and make the information available to end users. The implementation of this feature also moved startup of the NameNode’s HTTP server much earlier in the startup sequence, so users can access the web UI to observe startup progress immediately after restarting the process.
Here we see a typical display from the NameNode web UI during startup, showing completion and progress of the various steps executed during startup. In this case, the NameNode has completed loading fsimage and is 70% done loading edits. After that, it will proceed to saving a new checkpoint and entering safe mode. Completed phases are displayed in bold text. The currently running phase is displayed in italics. Phases that have not yet begun are displayed in gray text.
The UI also shows key information, such as where the fsimage/edits was loaded from, the size of those files, and the number of inodes loaded during startup. During checkpointing all the locations where the checkpointed image is written to, and the size of the image is also shown.
We decided to expose the same data in a machine-readable JSON format accessible via the new/startupProgress URL. This enables future integration with Hadoop management tools such as Apache Ambari for a richer user experience than can be provided in the NameNode web UI.
We also chose to publish this information via JMX for ease of integration with a wide variety of enterprise monitoring tools. Here we can see jconsole inspecting the new counters.
The HDFS team at Hortonworks constantly searches for new ways to make it easier to manage distributed storage. This new feature greatly increases visibility into the NameNode’s startup sequence for operators who manage Hadoop clusters. We hope you find this feature as useful in your own clusters as we do in ours.