Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.
AVAILABLE NEWSLETTERS:
Thank you for subscribing!
Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.
Thank you for subscribing!
I’m thrilled to note that the Apache Hadoop community has declared Apache Hadoop 2.x as Generally Available with the release of hadoop-2.2.0!
This represents the realization of a massive effort by the entire Apache Hadoop community which started nearly 4 years to date, and we’re sure you’ll agree it’s cause for a big celebration. Equally, it’s a great credit to the Apache Software Foundation which provides an environment where contributors from various places and organizations can collaborate to achieve a goal which is as significant as Apache Hadoop v2.
Congratulations to everyone!
Apache Hadoop v2 is not just a major release number, but represents generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes Hadoop beyond merely batch applications to taking its position as a ‘data operating system’.
To recap, Apache Hadoop v1 comprised of HDFS & MapReduce.
With HDFS one could store data of all manner, however MapReduce was the only algorithm you could use to process that data in parallel. That was very limiting since MapReduce, although very general, proved inadequate to satisfy all the demands being placed on Apache Hadoop.
As Apache Hadoop crystallizes into a key component of a Modern Data Architecture, users and customers want to store all data in HDFS and interact with that data in multiple ways:
The community has worked together to make HDFS itself a much more scalable, efficient and enterprise-friendly storage platform by addressing key functionality – High Availability for the HDFS NameNode, Federation for scaling & HDFS Snapshots to list a few.
With YARN, Apache Hadoop now clearly delineates the system (resource management, security, SLAs etc.) from the application framework (e.g. MapReduce) and allows for multiple ways to interact with the data in HDFS (batch with MapReduce, streaming with Apache Storm, interactive SQL with Apache Hive and Apache Tez).
We are already seeing the benefits of this vision in the form of many and varied applications and services being re-vectored on top of YARN such as Apache Storm for event processing, Apache Giraph for graph processing, Apache Tez for interactive SQL queries, HOYA for running services such as Apache HBase and Apache Accumulo on YARN and so on. Exciting times indeed!
As a result the Hadoop stack looks very different with Hadoop v2:
Personally, it’s a huge thrill to see this baby grow up and reach adulthood since the original Jira ticket (MAPREDUCE-279) opened more than 5 ½ years ago!
As a lot of people are aware, Apache Hadoop 2 landed the Beta tag a few months ago. Since then the community has spent a lot of time validating the APIs, protocols and the system itself. As a result we are now very confident in our ability to not only handle the workloads that will be thrown at Apache Hadoop, but also in our ability to do so in a forward compatible manner such that Apache Hadoop v2 represents a stable base atop which the ecosystem can flourish in the future.
For those who, like me, are more comfortable with simplified lists (*smile*), here are the enhancements and major features:
Although it’s a major milestone and a big reason to celebrate, the Apache Hadoop community will continue to drive it forward under the aegis of the the ASF. There are ever more things to do, user-cases to fulfill and users to thrill. The HDFS community is striving hard to finish up the addition of symlinks to HDFS which just didn’t make the cut at the last minute. On the YARN side we plan to add more enhancements such as advanced scheduling features, high availability for YARN Resource Manager, enhanced support for long-running services and generally make it easier to run other applications such as Apache Storm within YARN. Stay tuned!
As always, it’s an honor and pleasure to with the entire Apache Hadoop community – thanks to everyone who contributed!
This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, Nifi Registry, HAWQ, Zeppelin, Slider, Mahout, MapReduce, HDFS, YARN, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.
© 2011-2018 Hortonworks Inc. All Rights Reserved.
Comments
Congratz on reaching this milestone, looking forward to all the new stuff
Will there be a Hadoop 2 VM sandbox released any time soon?
Yes!
You can find find HDP 2 Beta Sandbox at https://hortonworks.com/products/hdp-2/#install
Appears to only be for 64-bit Linux. Anything for 32-bit Windows?
Also, when will a good Hadoop 2.x tutorial be released?
You can find some YARN resources on our Getting Started pages: https://hortonworks.com/get-started/develop
It’s just hats off to you guys.
To see this high-level of achievement in producing this type of platform, one that is bringing a paradigm shift so quickly, seems to me like in feat not only development, but in the output from Apache community.
Most companies are still trying just to get their head around the basic concepts!
Really, hats off.
Is Hadoop 2.2 is available on Windows Server and Windows local machines?
Congratulations on reaching this important milestone
Congrats Arun and all the ASF team !!
Its a great time to be in Software Industry. Wishing all the success and looking forward to even contribute back.
Regards,
Manish
congradulations ! After all this improvement, Should we expect any changes in Hive and Pig ?