We are very excited to announce NextGen Apache Hadoop MapReduce is getting close. We just merged the code base to Apache Hadoop mainline and Arun is about to branch a hadoop-0.23 to prepare for a release!
We’ve talked about NextGen Apache Hadoop MapReduce and it’s advantages. The drawbacks of current Apache Hadoop MapReduce are both old and well understood. The proposed architecture has been in the public domain for over 3 years now. The team started the work in August 2010 starting with a prototype upon which we did rapid iterations. This culminated with an initial check-in to Apache Hadoop SVN in March 2011. Since then we’ve done all development on the MR-279 branch in Apache and have run really hard to get NextGen Hadoop MapReduce ready. We hope to see it soon on *your* cluster.
Some fun stats:
How to contribute
Now, this is just the beginning. There is still much to do. Making the MapReduce framework production quality is the top priority but implementing/porting alternative computing frameworks will excite some contributors as well. To help that cause, I am pasting the new source code directory structure here:
I know no single list is comprehensive given the monstrosity of the effort, but I wanted to recognize all of the contributors – Arun C. Murthy, Christopher Douglas, Devaraj Das, Greg Roelofs, Jeffrey Naisbitt, Josh Wills, Ahmed Radwan, Jonathan Eagles, Krishna Ramachandran, Luke Lu, Mahadev Konar, Robert Evans, Sharad Agarwal, Siddharth Seth, Thomas Graves, Ramya Sunil (testing), Giridharan Kesavan(release engineering), Karam Singh and Santosh Kumar (performance engineering).
Alright, it’s time you checked out the bleeding edge Hadoop MapReduce trunk. Start hacking and have fun!
– Vinod Kumar Vavilapalli a.k.a @tshooter