Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
October 15, 2013
prev slideNext slide

Apache Hadoop 2 is now GA!

I’m thrilled to note that the Apache Hadoop community has declared Apache Hadoop 2.x as Generally Available with the release of hadoop-2.2.0!

This represents the realization of a massive effort by the entire Apache Hadoop community which started nearly 4 years to date, and we’re sure you’ll agree it’s cause for a big celebration. Equally, it’s a great credit to the Apache Software Foundation which provides an environment where contributors from various places and organizations can collaborate to achieve a goal which is as significant as Apache Hadoop v2.

Congratulations to everyone!

The Journey

Apache Hadoop v2 is not just a major release number, but represents generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes Hadoop beyond merely batch applications to taking its position as a ‘data operating system’.

To recap, Apache Hadoop v1 comprised of HDFS & MapReduce.

With HDFS one could store data of all manner, however MapReduce was the only algorithm you could use to process that data in parallel. That was very limiting since MapReduce, although very general, proved inadequate to satisfy all the demands being placed on Apache Hadoop.

As Apache Hadoop crystallizes into a key component of a Modern Data Architecture, users and customers want to store all data in HDFS and interact with that data in multiple ways:

  • Real-time processing of events (sensor, telecommunications, fraud etc.) even before it lands on HDFS
  • Interactive query capabilities for interrogating new data for data analysts (SQL) and data scientists (SQL plus scripting etc.)
  • The need to productionize the insight i.e. batch-processing, reporting etc. in a well-defined and timely manner

The community has worked together to make HDFS itself a much more scalable, efficient and enterprise-friendly storage platform by addressing key functionality – High Availability for the HDFS NameNode, Federation for scaling & HDFS Snapshots to list a few.

With YARN, Apache Hadoop now clearly delineates the system (resource management, security, SLAs etc.) from the application framework (e.g. MapReduce) and allows for multiple ways to interact with the data in HDFS (batch with MapReduce, streaming with Apache Storm, interactive SQL with Apache Hive and Apache Tez).

We are already seeing the benefits of this vision in the form of many and varied applications and services being re-vectored on top of YARN such as Apache Storm for event processing, Apache Giraph for graph processing, Apache Tez for interactive SQL queries, HOYA for running services such as Apache HBase and Apache Accumulo on YARN and so on. Exciting times indeed!

As a result the Hadoop stack looks very different with Hadoop v2:


Personally, it’s a huge thrill to see this baby grow up and reach adulthood since the original Jira ticket (MAPREDUCE-279) opened more than 5 ½ years ago!

Apache Hadoop v2

As a lot of people are aware, Apache Hadoop 2 landed the Beta tag a few months ago. Since then the community has spent a lot of time validating the APIs, protocols and the system itself. As a result we are now very confident in our ability to not only handle the workloads that will be thrown at Apache Hadoop, but also in our ability to do so in a forward compatible manner such that Apache Hadoop v2 represents a stable base atop which the ecosystem can flourish in the future.

For those who, like me, are more comfortable with simplified lists (*smile*), here are the enhancements and major features:

  • YARN
  • High Availability for HDFS
  • HDFS Federation
  • HDFS Snapshots
  • NFSv3 access to data in HDFS
  • Binary Compatibility for MapReduce applications between Hadoop v1 and Hadoop v2 to ease migration
  • Performance
  • Support for running Hadoop on Microsoft Windows
  • Integration testing for the entire Apache Hadoop ecosystem at the ASF.


Although it’s a major milestone and a big reason to celebrate, the Apache Hadoop community will continue to drive it forward under the aegis of the the ASF. There are ever more things to do, user-cases to fulfill and users to thrill. The HDFS community is striving hard to finish up the addition of symlinks to HDFS which just didn’t make the cut at the last minute. On the YARN side we plan to add more enhancements such as advanced scheduling features, high availability for YARN Resource Manager, enhanced support for long-running services and generally make it easier to run other applications such as Apache Storm within YARN. Stay tuned!


As always, it’s an honor and pleasure to with the entire Apache Hadoop community – thanks to everyone who contributed!



Denis Gobo says:

Congratz on reaching this milestone, looking forward to all the new stuff

Ray Niccolls says:

Will there be a Hadoop 2 VM sandbox released any time soon?

James Dilworth says:


You can find find HDP 2 Beta Sandbox at

Lyle Z says:

Appears to only be for 64-bit Linux. Anything for 32-bit Windows?

Also, when will a good Hadoop 2.x tutorial be released?

Marc Holmes says:

You can find some YARN resources on our Getting Started pages:

Jamie Sutphin says:

It’s just hats off to you guys.

To see this high-level of achievement in producing this type of platform, one that is bringing a paradigm shift so quickly, seems to me like in feat not only development, but in the output from Apache community.

Most companies are still trying just to get their head around the basic concepts!

Really, hats off.

Padma says:

Is Hadoop 2.2 is available on Windows Server and Windows local machines?

Sanjeev says:

Congratulations on reaching this important milestone

Manish Malhotra says:

Congrats Arun and all the ASF team !!

Its a great time to be in Software Industry. Wishing all the success and looking forward to even contribute back.


Mandana says:

congradulations ! After all this improvement, Should we expect any changes in Hive and Pig ?

Leave a Reply

Your email address will not be published. Required fields are marked *