RPC Improvements and Wire Compatibility in Apache Hadoop

Hadoop RPC is the primary communication mechanism between the nodes in an Apache Hadoop cluster. Maintaining wire compatibility, as new features are added to Apache Hadoop, has been a significant challenge with the current RPC architecture. In this blog, I highlight the architectural improvement in Hadoop RPC and how it enables wire compatibility and rolling upgrades.

Challenges for Wire Compatibility

Earlier Hadoop RPC used Writable serialization that made it difficult to evolve the protocols while maintaining wire compatibility. Hadoop RPC also did not distinguish between data types that are exchanged over the wire, and the types used on the client side or the server. This made it more complicated to maintain compatibility as new features that required changes to the common data types were added.

New Approach and Benefits

The new approach separates wire data types from the data types used in the service implementation or client code. This allows complete flexibility in making enhancements on a server or client. Wire compatibility can be maintained independently of the changes in the server or client.

The new architecture allows using different serialization mechanisms for the on-wire data types in a pluggable manner. The default serialization chosen for on-wire data types in Hadoop is protocol buffers.

Protocol buffers are a proven technology and have several strong benefits such as efficiency, extensibility, compactness and cross language support. Protocol buffers allow certain modifications to a protocol in a compatible way, such as adding new optional fields. Protocol buffers are widely supported across many popular languages, and pave the way for non-java Hadoop clients. Early tests have shown significant performance gains with protocol buffers compared to Writables.

Another significant benefit of the new design is to enable rolling upgrades. This is very critical in the context of high availability of Namenode, where active and standby Namenodes can be upgraded independently without disrupting the service.

Acknowledgements & Conclusion

This effort was a significant overhaul in the Hadoop RPC code. This will be available in 0.23.2 release of Apache Hadoop as per the current plan.

It was a fun project with my fellow Hortonworkers, Sanjay Radia and Suresh Srinivas.

~Jitendra Pandey

Categorized by :
Apache Hadoop

Comments

sambit
|
March 31, 2012 at 8:38 am
|

How about bringing in the improvements over sockets using RDMA / SDP concepts. This will for sure bring significant improvements for all interconnects between data nodes

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox
Thursday, October 23, 2014
1:00 PM Eastern / 12:00 PM Central / 11:00 AM Mountain / 10:00 AM Pacific

More Webinars »

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.