Posts by Jitendra Pandey:


Namenode HA Reaches a Major Milestone

We reached a significant milestone in HDFS: the Namenode HA branch was merged into the trunk. With this merge, HDFS trunk now supports HOT failover.

Significant enhancements were completed to make HOT Failover work:

  • Configuration changes for HA
  • Notion of active and standby states were added to the Namenode
  • Client-side redirection
  • Standby processing journal from Active
  • Dual block reports to Active and Standby

We have extensively tested HOT manual failover in our labs over the last few months. The HDFS team is now working on completing automatic failover. Please see HDFS-1623 for more details.

~Jitendra Pandey

RPC Improvements and Wire Compatibility in Apache Hadoop

Hadoop RPC is the primary communication mechanism between the nodes in an Apache Hadoop cluster. Maintaining wire compatibility, as new features are added to Apache Hadoop, has been a significant challenge with the current RPC architecture. In this blog, I highlight the architectural improvement in Hadoop RPC and how it enables wire compatibility and rolling upgrades.

Challenges for Wire Compatibility

Earlier Hadoop RPC used Writable serialization that made it difficult to evolve the protocols while maintaining wire compatibility. Hadoop RPC also did not distinguish between data types that are exchanged over the wire, and the types used on the client side or the server. This made it more complicated to maintain compatibility as new features that required changes to the common data types were added.

Read More

Fine-Tune Your Apache Hadoop Security Settings

Apache Hadoop is equipped with a robust and scalable security infrastructure. It is being used at some of the biggest cluster installations in the world, where hundreds of terabytes of sensitive and critical data are processed every day.

Owen O’Malley provided a nice overview of Apache Hadoop security in his blog Motivations for Apache Hadoop Security. Devaraj Das also covered some of the core pieces of Apache Hadoop’s security architecture in his blog The Role of Delegation Tokens in Apache Hadoop Security.

The intent of this blog is to cover some of the features of the Apache Hadoop security infrastructure that will help cluster administrators fine-tune the security settings of their clusters.

Read More