In this post, we’ll explain the difference between Hadoop 1.0 and 2.0. After all, what is Hadoop 2.0? What is YARN?
For starters – what is Hadoop and what is 1.0? The Apache Hadoop project is the core of an entire ecosystem of projects. It consists of four modules (see here):
Hadoop 1.0 is based on the Hadoop .20.205 branch (it went 0.18 -> 0.19 -> 0.20 -> 0.20.2 -> 0.20.205 -> 1.0). Hard to follow? Check out this chart. Not hard for an open source developer, but obscure for an enterprise product – so everyone agreed to call 0.20.205 ‘1.0’, the project having matured to that point.
Hadoop 2.0 is from the Hadoop 0.23 branch, with major components re-written to enable support for features like High Availability, and MapReduce 2.0 (YARN), and to enable Hadoop to scale out past 4,000 machines per cluster. Specifically, Hadoop 2.0 adds (see here):
Hadoop 1.0 is rock-solid, Hadoop 2.0 is still in active development and is considered Alpha. Work continues to stabilize Hadoop 2.0, so stay tuned!