Announcing Apache Hadoop 2.0.3 Release and Roadmap


As the Release Manager for hadoop-2.x, I’m very pleased to announce the next major milestone for the Apache Hadoop community, the release of hadoop-2.0.3-alpha!

2.0 Enhancements in this Alpha Release

This release delivers significant major enhancements and stability over previous releases in hadoop-2.x series. Notably, it includes:

  • QJM for HDFS HA for NameNode (HDFS-3077) and related stability fixes to HDFS HA
  • Multi-resource scheduling (CPU and memory) for YARN (YARN-2, YARN-3 & friends)
  • YARN ResourceManager Restart (YARN-230)
  • Significant stability at scale for YARN (over 30,000 nodes and 14 million applications so far, at time of release – see more details from folks at Yahoo! here)

Where is hadoop-2 and What is Left?

It is important to note that the this release is still considered alpha as there are a few items that still need to be addressed before we enter beta in the next couple months. Most importantly some of APIs, particularly the HDFS & YARN protobuf-based protocols aren’t fully-baked. Also note that there are some API changes from the previous hadoop-2.0.2-alpha release and that your applications will need to recompile against the new hadoop-2.0.3-alpha. Please see the Hadoop 2.0.3-alpha release notes for details.

We are converging fast on ironing out the API issues (both in HDFS & YARN/MapReduce) and, currently, plan to cut a hadoop-2.0.4-beta release in the next couple of months after this effort. It also helps to have a major presence like Yahoo! test out hadoop-2 HDFS HA over the course of the coming months as they’ve noted in their blog. To this end, the code base has also gone through significant churn and as with any alpha we expect to uncover some further issues as we endure this ongoing test.

There is still a lot of work ahead of us, but we believe that hadoop-2.0.4-beta will be a major step to then release a fully stable, supported hadoop-2 release, exciting times! Stay tuned!


As always, it’s a pleasure to work with everyone in the community – thank *you*, this goes to everyone who has contributed to this release. A special mention for Todd Lipcon for his contributions to QJM for HDFS HA and the Yahoo Hadoop team (Robert Evans, Thomas Graves, Daryn Sharp, Jason Lowe and everyone else) for their efforts in getting YARN to stability and large-scale deployments on their clusters.

Arun C. Murthy

Categorized by :
Hadoop Other

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.