newsletter

Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
October 14, 2013
prev slideNext slide

NameNode High Availability in HDP 2.0

As part of a modern data architecture, Hadoop needs to be a good citizen and trusted as part of the heart of the business. This means it must provide for all the platform services and features that are expected of an enterprise data platform.

The Hadoop Distributed File System is the rock at the core of HDP and provides reliable, scalable access to data for all analytical processing needs. With HDP 2.0, built into the platform itself, HDFS now has automated failover with a hot standby, with full stack resiliency. We’ll break this down in this blog post and outline what this means for your HDP operations and cluster design.

HDP 2.0 NameNode High Availability: Feature Overview

Automated Failover: HDP pro-actively detects NameNode host and process failures and will automatically switch to the standby NameNode to maintain availability for the HDFS service. There is no need for human intervention in the process – System Administrators can sleep in peace!

Hot Standby: Both Active and Standby NameNodes have up to date HDFS metadata, ensuring seamless failover even for large clusters – which means no downtime for your HDP cluster!

Full Stack Resiliency: The entire HDP stack (MapReduce, Hive, Pig, HBase, Oozie) has been certified to handle a NameNode failure scenario without losing data or the job progress. This is vital to ensure long running jobs that are critical to complete on schedule will not be adversely affected during a NameNode failure scenario.

With HDP 2.0, this entire functionality is built into the HDP product. This means:

  • No need for shared storage
  • No need for external HA frameworks
  • Certified with a secure Kerberos configuration

High Availability Architecture Deep Dive

Let’s walk through the architecture. This will outline the different components that are built into HDP to provide this functionality.

In each cluster, two separate machines are configured as NameNodes. In a working cluster, one of the NameNode machine is in the Active state, and the other is in the Standby state.

The Active NameNode is responsible for all client operations in the cluster. The Standby NameNode maintains enough state to provide a fast failover. In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate through a group of separate daemons called JournalNodes. The file system journal logged by the Active NameNode at the JournalNodes is consumed by the Standby NameNode to keep it’s file system namespace in sync with the Active.

In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information of the location of blocks in your cluster. DataNodes are configured with the location of both the NameNodes and send block location information and heartbeats to both NameNode machines.

namenodeha

The ZooKeeper Failover Controller (ZKFC) is responsible for HA Monitoring of the NameNode service and for automatic failover when the Active NameNode is unavailable. There are two ZKFC processes – one on each NameNode machine. ZKFC uses the Zookeeper Service for coordination in determining which is the Active NameNode and in determining when to failover to the Standby NameNode.

Quorum journal manager (QJM) in the NameNode writes file system journal logs to the journal nodes. A journal log is considered successfully written only when it is written to majority of the journal nodes. Only one of the Namenodes can achieve this quorum write. In the event of split-brain scenario this ensure that the file system metadata will not be corrupted by two active NameNodes.

In HA setup, HDFS clients are configured with a logical name service URI and the two NameNodes corresponding to it. The clients perform source side failover. When a client cannot connect to a NameNode or if the NameNode is in standby mode, it performs fail over to the other NameNode.

Go Get It

HDP 2.0 brings advancements in reliability to the core of the Hortonworks Data Platform.

Check out our in-depth documentation on enabling a NameNode HA cluster.

Download HDP 2.0 Beta and deploy today!

Tags:

Comments

Nikhil says:

Why is that the failover controlled called ZKFC? Its not trying to failover the zookeeper nodes. Its trying to failover the namenode. Should not it be called a Namenode Failover Controller or NnFC rather ?

Another example where hadoop community has shown bad taste with naming conventions (other being secondary namenode) 😉

Also, is it possible that there can be more than 2 namenodes in the picture, as in for example 3 namenodes. Just to be on the safer side when the active namenode is down and there is a failover and there is still a risk with the single node being active?
Theoretically, since the namenode is integrated with quorom journal manager, it should be possible to horizontally scale the availability of namenodes, no?

>>
When a client cannot connect to a NameNode or if the NameNode is in standby mode, it performs fail over to the other NameNode.
>>

How does client know which namenode to connect to during failover? since the heartbeats are being sent to both the namenodes, what happens during the failover? and how does client get back to know of the re-availability of the namenode during the failback or just bringing up the service on another node? How would it be if it has to just bring up another node with namenode services during the failover (assuming there is a total hardware failure on one node). Do they contact Zookeeper service or Journal nodes?

Thanks,
Nikhil

Balaswamy vaddeman says:

I agree I think they have to revisit naming convention so that they make people easy to understand.

StanleyJohns says:

@Nikhil “How does client know which namenode to connect to during failover”.
ZKFC initially knows which is the active and standby node and maintains the heartbeat of both. When the active node goes down, the standby becomes the (new) active namenode. This gives time for the Engineer to troubleshoot/fix the ‘disabled’ node. Once fixed, the (old) active node is now the standby node, and the (old) standby node is still the active node. So you still have the logical active/standby nodes but physically they are different servers.
You may choose to do a manual failover so as to go back to the initial physical setup, e.g. if your nomenclature is name-node1 (active) and name-node2 (passive). You may always want name-node1 to be in the active state. Although that manual failover of a production server, still requires nerves of steel.
“Why is that the failover controlled called ZKFC”. I would argue that since Zookeeper is the centralized configuration service, the ZK Failover Controller naming makes sense. You would want the FC to be separate from the name nodes; i.e. how else would you monitor/failover the namenode going down if the FC was on the name node itself.
“is it possible that there can be more than 2 namenodes in the picture, as in for example 3 namenodes”. I agree, this would be ideal. Maybe in the next version.

Jesse Hu says:
Your comment is awaiting moderation.

Hi there,

I didn’t find hadoop-hdfs-journalnode package on
http://s3.amazonaws.com/public-repo-1.hortonworks.com/index.html#/HDP/centos6/2.x/GA/2.0.6.0/hadoop, how can I install hadoop-hdfs-journalnode package to deploy Namenode HA cluster if not using Ambari ?

Thanks.

Jesse Hu says:
Your comment is awaiting moderation.

Hi, I didn’t find hadoop-hdfs-journalnode package on http://s3.amazonaws.com/public-repo-1.hortonworks.com/index.html#/HDP/centos6/2.x/updates/2.0.6.0/hadoop , while other Bigtop based hadoop distribution (e.g. Pivotal HD, CDH4) has the journalnode package. How can I manually deploy HDP 2.0 Namenode HA cluster without hadoop-hdfs-journalnode package ?

Thanks.

Raj Kumar Gupta says:

Hi team,

Please correct my understanding please….

In this model, Active namenode will keep on taking the Edits from Journal Nodes (which may be on different physical box) and will applying the latest edits on his own FSImage in every 1 hours or at size of 128 MB.

There is no separate secondary namenode process in an HA cluster, only a pair of namenode works because stand by namenode also have the separate copy of FSImage and keep on synchronize with Edits logs from Journal nodes.
How the check point will work here in this case?

Thanks & Regards,
Raj

Smarak Das says:

Raj,
As far as I understand, the Secondary NameNode fetch the Edit Logs & FSImage from Active NameNode, merge the transactions from Edit Logs to FSImage locally & flush the updated FSImage to Active NameNode. This is the only activity of Secondary NameNode. The Standby NameNode performs the above action as well.

Vinod says:

Hi Guys,
Can anyone confirm that safe mode is always ON for standby namenode and safe mode is off for active namenode.

Abhinav Phutela says:

That is not the case. Safe mode for both Active and Standby namenode is off (Always). You can check it with CLI
>hdfs dfsadmin -safemode get
Safe mode is OFF in test1.x.x/192.168.xxx.xxx:8020
Safe mode is OFF in test2.x.x/192.168.xxx.xxx:8020
Hope that solves your doubt.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums