Update on Apache Hadoop-0.23

There has been a lot of progress on hadoop-0.23. We’re continuing to crank through issues as we get ready to ship.

We are mostly past the initial challenges of moving our entire build infrastructure to Maven. Many thanks to Alejandro, Tom, Giri & Eric Yang for making it happen.

HDFS is nearly there:

  • HDFS Federation and Client-side mount tables have been tested with ~300 node clusters with security on.
  • HDFS upgrades have been tested from 0.20.2xx.
  • Functional tests for HDFS  are complete.

NextGen MapReduce (aka MRv2, aka YARN) is making great progress:

  • We are happy to report we’ve done extensive scale testing to confirm stability:
    • Sort/GridMixv3 etc. at ~350nodes
    • Scale testing with simulated clusters of ~1500 nodes
  • Functional tests for all of MapReduce functionality
  • Pig  (0.9 & 0.9.1) working with NextGen MapReduce
  • All above have been done with no regressions in security.

We are about to finish performance certification for both HDFS & MapReduce in the next couple of weeks. After that is completed, we will start integration tests with HBase, Hive, Oozie, etc.

We fixed 75 bugs in September alone and have another 50 or so bugs to go. There were at least 4 different organizations that contributed patches to MRv2 in Sept alone: Yahoo, Hortonworks, LinkedIn & Huawei.

Given our current state, I’m confident we will have a strong hadoop-0.23.0 release by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius! :)

Thanks to everyone who contributed and we look forward to continued help.

Arun C. Murthy (@acmurthy)

Categorized by :
Hadoop HDFS MapReduce

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.