Update on Apache Hadoop-0.23
There has been a lot of progress on hadoop-0.23. We’re continuing to crank through issues as we get ready to ship.
We are mostly past the initial challenges of moving our entire build infrastructure to Maven. Many thanks to Alejandro, Tom, Giri & Eric Yang for making it happen.
HDFS is nearly there:
- HDFS Federation and Client-side mount tables have been tested with ~300 node clusters with security on.
- HDFS upgrades have been tested from 0.20.2xx.
- Functional tests for HDFS are complete.
NextGen MapReduce (aka MRv2, aka YARN) is making great progress:
- We are happy to report we’ve done extensive scale testing to confirm stability:
- Sort/GridMixv3 etc. at ~350nodes
- Scale testing with simulated clusters of ~1500 nodes
- Functional tests for all of MapReduce functionality
- Pig (0.9 & 0.9.1) working with NextGen MapReduce
- All above have been done with no regressions in security.
We are about to finish performance certification for both HDFS & MapReduce in the next couple of weeks. After that is completed, we will start integration tests with HBase, Hive, Oozie, etc.
We fixed 75 bugs in September alone and have another 50 or so bugs to go. There were at least 4 different organizations that contributed patches to MRv2 in Sept alone: Yahoo, Hortonworks, LinkedIn & Huawei.
Given our current state, I’m confident we will have a strong hadoop-0.23.0 release by late October. The current plan is to deploy to alpha clusters in November. Citius, Altius, Fortius!
Thanks to everyone who contributed and we look forward to continued help.
Arun C. Murthy (@acmurthy)