Apache Hadoop 2.0.2-alpha Released!

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.0.2-alpha.

This is the second (alpha) release of the next generation release of Apache Hadoop 2.x and comes with significant enhancements to both the major components of Hadoop:

  • HDFS HA has undergone significant enhancements since the previous release for NameNode High Availability
  • YARN has undergone significant testing and stabilization and validation as is been heavily battle-tested since the previous release.

These are exciting times indeed for the Apache Hadoop community – personally, this is very reminiscent of the period in 2009 when we finally saw the light at the end of the tunnel during the stabilization of Apache Hadoop 1.x (then called Apache Hadoop 0.20.x). A déjà vu, if you will – albeit of the pleasant kind! Yes, we have a few miles to clock, but it feels like the hardest part is already behind us. At the time of release, YARN has already been deployed on super-sized clusters with 2,000 nodes and 3,600 nodes (totaling to nearly 6,000 nodes) at Yahoo alone*.

Going forward, I have no doubt that we are well of our way to sign-off on hadoop-2.x early next year and we are now heads down wrapping up the last of feature work since we have a reasonably stable base, such as:

  • HDFS HA without need for shared storage (already merged into Hadoop trunk sans a couple of design enhancements).
  • YARN ResourceManager availability.
  • YARN scheduling enhancements such as multi-resource scheduling (nearly complete, should be committed soon) and preemption.

Having said that, it’s critical for the developer community to get feedback on hadoop-2.x from the user community to ensure we continue to deliver great software – so, please, do go ahead, download the bits from the Apache Hadoop releases page, try the release and give us your valuable feedback – it’s a personal request! Of course, if you prefer a fully packaged and integrated stack you can browse to the Hortonworks Downloads page to try Hortonworks Data Platform 2.0 Alpha which integrates Hadoop 2.0.2-alpha with other important components such as Apache HBase, Apache Pig, Apache Hive, Apache HCatalog, Apache ZooKeeper and Apache Oozie

For more information about the HDP 2.0 alpha, you can check out our blog post from yesterday.

Acknowledgements
I’d like to thank everyone who has or continues to contribute to Apache Hadoop – everyone in the community. A special mention for Todd Lipcon for his contributions to HDFS HA and the Yahoo Hadoop team (Robert Evans, Thomas Graves, Daryn Sharp, Jason Lowe and everyone else) for their help in getting YARN to stability and large-scale deployments on their clusters.

*Yahoo is currently running hadoop-0.23.4 release which essentially is hadoop-2.0.2-alpha without HDFS high availability.

Categorized by :
Apache Hadoop Hadoop 2.0 Hadoop Ecosystem MapReduce YARN

Comments

spirit
|
October 16, 2012 at 11:06 pm
|

Congratulations

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.

Thank you for subscribing!