It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !
In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.
Many thanks to all of the contributors and committers who collaborated on this version and resolved a total of nearly 900 JIRA issues across four areas:
Here are some details about the most important features. For the complete list of features, improvements and bug fixes, see the sidebar and the release notes.
Admins can define storage tiers across disks in a Datanode, and applications can utilize APIs to store data to these different storage tiers. This means that administrators can optimize their applications running on Hadoop by using:
Apache Hadoop 2.6.0 includes enhancements to the core Apache Hadoop YARN platform so that long-lived services (such as Apache Storm, Apache Samza, Apache Kafka or Apache HBase) can run in YARN and take advantage of its strengths for fault tolerance, security and ease of maintenance.
Apache Hadoop was originally architected for processing data in batch. But some applications are “always on,” ready to process incoming data. For example, Apache Storm must be ready to process streaming data in real time at any time of day, on any day of the year.
With Hadoop 2.6.0, clusters can now utilize the same infrastructure to schedule, execute and manage multiple workloads of all durations. Long-lived services like Storm and HBase can peacefully co-exist alongside applications that are used for ad hoc work at a particular point in time (like Apache Hive or Apache Pig).
The new work-preserving restart feature allows applications to maintain their completed and in-progress states, in the face of a node failure or restart. YARN can now provide rolling upgrade support with minimal service degradation for running applications. Application work that has completed or was in progress is maintained during a node restart, and progress picks up without having to restart all tasks from the beginning.
The key driver for next release of Apache Hadoop is moving to JDK7+ whereby we will now mandate use of JDK7 (HADOOP-10530) for Apache Hadoop going forward and also support JDK8 as a runtime (HADOOP-11090).
Other important activities going on in the Apache Hadoop community are:
As always, you can follow along the developments by tracking the Roadmap Wiki for Apache Hadoop.
Many thanks to everyone who contributed to this release, and to the entire Apache Hadoop community.