cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
December 01, 2014
prev slideNext slide

Announcing Apache Hadoop 2.6.0

It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !

In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.

Many thanks to all of the contributors and committers who collaborated on this version and resolved a total of nearly 900 JIRA issues across four areas:

  • Hadoop Common: 231 JIRAs resolved
  • Hadoop HDFS: 305 JIRAs resolved
  • Hadoop YARN: 290 JIRAs resolved
  • Hadoop MapReduce: 70 JIRAs resolved

Highlights for Apache Hadoop 2.6.0

Here are some details about the most important features. For the complete list of features, improvements and bug fixes, see the sidebar and the release notes.

Screen Shot 2014-12-01 at 10.10.01 AM

Enhance Support for Heterogeneous Storage Tiers in HDFS

Admins can define storage tiers across disks in a Datanode, and applications can utilize APIs to store data to these different storage tiers. This means that administrators can optimize their applications running on Hadoop by using:

  • The SSD storage tier to improve read/write latency
  • Memory storage tier for fast read/write to either temporary data or for fault-aware applications (e.g. Spark, Tez etc.)
  • The archive storage tier to improve storage efficiency.

Support for Long-Running Services in YARN

Apache Hadoop 2.6.0 includes enhancements to the core Apache Hadoop YARN platform so that long-lived services (such as Apache Storm, Apache Samza, Apache Kafka or Apache HBase) can run in YARN and take advantage of its strengths for fault tolerance, security and ease of maintenance.

Apache Hadoop was originally architected for processing data in batch. But some applications are “always on,” ready to process incoming data. For example, Apache Storm must be ready to process streaming data in real time at any time of day, on any day of the year.

With Hadoop 2.6.0, clusters can now utilize the same infrastructure to schedule, execute and manage multiple workloads of all durations. Long-lived services like Storm and HBase can peacefully co-exist alongside applications that are used for ad hoc work at a particular point in time (like Apache Hive or Apache Pig).

Rolling Upgrades for Work-Preserving Restarts in YARN

The new work-preserving restart feature allows applications to maintain their completed and in-progress states, in the face of a node failure or restart. YARN can now provide rolling upgrade support with minimal service degradation for running applications. Application work that has completed or was in progress is maintained during a node restart, and progress picks up without having to restart all tasks from the beginning.

Looking Ahead to Apache Hadoop 2.7 Release

The key driver for next release of Apache Hadoop is moving to JDK7+ whereby we will now mandate use of JDK7 (HADOOP-10530) for Apache Hadoop going forward and also support JDK8 as a runtime (HADOOP-11090).

Other important activities going on in the Apache Hadoop community are:

  • Support for Erasure Codes in HDFS – HDFS-7285
  • Support disk as a resource in YARN for scheduling and isolation – YARN-2139
  • Container resource delegation to extend YARN resource management – YARN-1488

As always, you can follow along the developments by tracking the Roadmap Wiki for Apache Hadoop.

Acknowledgements

Many thanks to everyone who contributed to this release, and to the entire Apache Hadoop community.

Useful Links

Tags:

Comments

  • Great stuff.

    Yarn node labels also need data placement control to be fully useful I believe?

    The HDFS encryption and open source Key Management server can’t come soon enough either.

    We’re literally front-running Apache development features in architectural design planing at the moment for some of these features like storage tiering, unified auditing, encryption with key management – such is the demand for them in high-end enterprises without having to tolerate proprietary lock-in components or to be concerned with migrating platform to HDP in later years from short term proprietary solutions if we can build it on the standard HDP platform from the start.

  • I assume that the PNG file is supposed to have links to the relevant issues? At present, all the links point to the PNG file. I have tried both Chrome and Firefox with the same results.

    Thanks for all the great work!

    Patrick

  • You guys need to update the Apache docs on how to set up hadoop 2.6.0 in clustered operation. The following:

    http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

    still suggests distribution of RPMs (which are no longer offered on Apache mirrors), and references config files off of conf/* where they’re now at $HADOOP_PREFIX/etc/hadoop/*.xml

    I’m just attempting a first installation, and it’s not clear (from what I’ve read so far) if the software assumes to reference config files from this path, or if we need to copy those files to conform to the documentation.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>