Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 15, 2011
prev slideNext slide

Apache Hadoop 0.23 is Here!

As the Release Manager, it’s my privilege to present Apache Hadoop 0.23:


I’ll present a short overview of the release in this post, more details are available in my recent talk on Apache Hadoop 0.23 at Hadoop World, 2011.


As shown by the above timeline of Apache Hadoop releases, hadoop-0.23 is the first major release off the Apache Hadoop mainline on track to be stable since hadoop-0.20 in April, 2009 – very exciting times indeed for the Hadoop community!

The Release

As you might be aware, hadoop-0.23 contains significant advances at all levels. Undoubtedly, the highlights are:

  • HDFS Federation
  • NextGen MapReduce

HDFS Federation
HDFS has undergone a transformation to separate out Namespace management from the Block (storage) management to allow for significant scaling of the filesystem – in the current architecture they are intertwined in the NameNode.

However, we have ensured that existing HDFS apis continue to work as before and user applications do not need to be modified.

More details are available in the HDFS Federation release documentation or in the recent HDFS Federation talk by Suresh Srinivas, a Hortonworks co-founder at Hadoop World, 2011.

NextGen MapReduce aka YARN
MapReduce has undergone a complete overhaul in hadoop-0.23 with the fundamental change to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. Thus, Hadoop becomes a general purpose data-processing platform where we can support MapReduce and other application execution frameworks such as MPI etc.

However, note that existing MapReduce applications should continue to work as-is and users shouldn’t notice the underlying frameworks changes i.e. replacement of JobTracker/TaskTracker with ResourceManager/NodeManager.

More details are available in the YARN release documentation or in the recent YARN presentation by Mahadev Konar, a Hortonworks co-founder at Hadoop World, 2011.

(Lots More)

Note that hadoop-0.23 has significant other enhancements:

  • Performance is 2x+ across the board (HDFS read/write path improvements, MapReduce shuffle re-write from Owen/me for the 2009 Terasort record, Optimizations for small jobs etc. etc.)
  • Full mavenization of the build (thanks to Alejandro Abdelnur & Tom White)
  • Re-write of HDFS edits log (thanks to Todd Lipcon)
  • Many, many more …

Next Steps

hadoop-0.23 is a big advance and as with big leaps it will take a little while for us to stabilize the release. Thus, please note that hadoop-0.23.0 is very much alpha quality and we do not recommend using it in production – yet!

If you are interested in what it takes and how we stabilize a major Hadoop release, please refer to my Apache Hadoop 0.23 presentation at Hadoop World, 2011.

Oh, the Hadoop HDFS developer community is also working on incorporating High Availability for the HDFS NameNode in an upcoming release from the hadoop-0.23 branch, more details here: and in the recent HDFS HA talk by Suresh Srinivas & Aaron Myers at Hadoop World, 2011.

We are currently in the process of rolling out hadoop-0.23.0 to test/alpha clusters (small clusters of ~500 nodes) at Yahoo and are excited to report that Pig, Hive, HBase, Oozie etc. should be integrated in very short order.


Apache Hadoop 0.23 is a quantum leap for the Hadoop community and we are very excited to have it released. Please do try the release (download it here) and provide us with feedback and help to stabilize it.

Again, I’d like to emphasize we have taken great care to ensure existing applications using the HDFS and MapReduce apis do not need to be modified to use the hadoop-0.23 release.

My personal, biased, highlight: NextGen MapReduce… and I really am proud of the efforts we’ve put in over the last 18 months or so to get this out. Well, I did warn that I was biased! 🙂

~Arun C. Murthy



Robert Stober says:

How does the NextGen MapReduce compare to Platform Computing’s commercial MapReduce product? Is theirs really better as they claim?

Tejas says:

In version 0.23, is there going to be an enhancement in monitoring and controlling apis exposed?

Are there any administration apis exposed?

Leave a Reply

Your email address will not be published. Required fields are marked *