Announcing Beta Release of Apache Hadoop 2

It’s my great pleasure to announce that the Apache Hadoop community has declared Hadoop 2.x as Beta with the vote closing over the weekend for the hadoop-2.1.0-beta release.

As noted in the announcement to the mailing lists, this is a significant milestone across multiple dimensions: not only is the release chock-full of significant features (see below), it also represents a very stable set of APIs and protocols on which we can continue to build for the future. In particular, the Apache Hadoop community has spent an enormous amount of time paying attention to stability and long-term viability of our APIs and wire protocols for both HDFS and YARN. This is very important as we’ve already seen a huge interest in other frameworks (open-source and proprietary) move atop YARN to process data and run services *in* Hadoop.

For folks who know me, it won’t come as a surprise when I say that I could not be more excited to see Apache Hadoop YARN achieve the beta moniker – personal highlight of the year! This means that applications besides MapReduce such as Apache Tez, HBase on YARN (HOYA), Storm-on-YARN, Apache Samza etc. can now be confident that they are building on a very stable base.

We blogged about resources for getting started with YARN over here – now is the time!

Details

Since the last major release in the Hadoop 2.x series, hadoop-2.1.0-beta represents a significant set of enhancements:

  • API & Protocol Stabilization.  The community is now very confident that we can, henceforth, support full compatibility (both API & wire-protocol) for applications built on both HDFS & YARN. See HADOOP-8990 & YARN-386 for details.
  • Binary Compatibility for existing MapReduce applications built for hadoop-1.x. The community decided, at the beginning of the year, to support full binary compatibility for existing MapReduce applications built for Apache Hadoop 1.x i.e. current stable release. This meant one could take an existing MapReduce application (jars, scripts etc.) and run them unchanged on both hadoop-1.x and hadoop-2.x. This release represents a culmination of that effort and removes last of the barriers for adoption by easing migration from hadoop-1.x to hadoop-2.x. See MAPREDUCE-5108 for more details.
  • Support for Microsoft Windows. As most people are aware, engineers from Microsoft & Hortonworks  have been collaborating with the community to support Hadoop on Windows. As the first official ASF release of Apache Hadoop to support Hadoop 2.x on Microsoft Windows, this represents a major milestone. See HADOOP-8562 for details.
  • HDFS Snapshots. This is the first Apache Hadoop release from the ASF which has full-support HDFS snapshots. See HDFS-2802 for more details.
  • NFS-v3 Access for HDFS. This is the first Apache Hadoop release from the ASF which has full-support for NFSv3 access to HDFS. See HDFS-4750 for more details.
  • Client APIs for YARN Application Developers. The YARN developer community has completely revamped and simplified client libraries for people developing new YARN applications. See YARN-418 for more details.
  • Integration Testing. Substantial amount of integration testing with the the entire Apache Hadoop ecosystem including Apache HBase, Apache Pig, Apache Hive etc.

Road Ahead to Apache Hadoop 2 GA

With Hadoop 2.x achieving beta status, the community is now fully focused on ironing out the last of the minor issues to prepare for Hadoop 2.x GA release in the next few weeks.

Currently, we have a handful of issues we are working on fixing through a follow up hadoop-2.1.1-beta release in the next few days. Once we get that release done, the plan is to put it through another wringer of a test-cycle before we release a hadoop-2.2.0 release which would be the first GA release of Apache Hadoop 2.x – hopefully by the middle of September, 2013. Exciting times, stay tuned!

Acknowledgements

As always, it’s an honor and pleasure to work with the wider Apache Hadoop community – thanks to everyone who contributed! A special note of thanks to Vinod K. V. who has been very instrumental in helping shepherd both YARN API stability work and the MapReduce Binary Compatibility features for this release.

Categorized by :
Apache Hadoop Hadoop 2.0 HDP 2 YARN

Comments

Senthil
|
August 26, 2013 at 5:11 am
|

Hi

Congrats on achieving the milestone.

Do we have the VM image of this beta release to be installed and configured in laptops as single user cluster?

Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.