Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
April 09, 2015
prev slideNext slide

HDFS Rolling Upgrades

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a HDFS cluster. See this previous post for an introduction on enterprise-grade rolling upgrades in HDP 2.2.

Hortonworks Data Platform provides centralized enterprise services for consistent operations of Hadoop clusters for a reliable enterprise-ready data lake. Apache Ambari is the operational framework in HDP, and yesterday’s blog post on Apache Ambari 2.0 announced that Ambari now automates the rolling upgrade process across all stack components. In this post, we will focus on how rolling upgrades work in Hadoop’s data storage layer: the Hadoop Distributed File System (HDFS). In a separate post, we will give more detail on Ambari’s new rolling upgrades capabilities.

The Hadoop Distributed File System (HDFS) is a critical component relied upon by all other services in a Hadoop cluster. All the higher level applications and services are directly or indirectly dependent on HDFS. Therefore, HDFS rolling upgrades must be successfully completed without cluster downtime, degradation in performance, or risk of data loss.

Since it was never practical to ask the admin to backup all the HDFS data before an upgrade, HDFS has included upgrade safeguards for a long time. In the past, this special “backup” mechanism in HDFS only supported full upgrades with, and the traditional full upgrade was performed as follows:

  1. The entire HDFS cluster was brought down.
  2. During the upgrade, HDFS was given a special “-upgrade” flag to ask it to make a copy of the metadata and make a “copy” of the data from the previous version. The data was not actually copied, but the HDFS data blocks were simply hard linked. This was used for rolling back the cluster to the older version in case of upgrade issues.
  3. The new software version was installed on the nodes and the cluster was restarted.
  4. If a rollback was necessary after an upgrade, it resulted in the cluster going back to pre-upgraded state. Any new data created or changes made in HDFS during upgrade were lost on rollback.

With this traditional method, upgrades of large clusters could cause hours of cluster downtime.

Now back to the original backup mechanism. Unfortunately, it would not work for rolling upgrades because it expects the whole cluster to be down—the antithesis of “rolling”. So we needed to completely revisit that backup mechanism.

Backups are still essential, because doing a rolling upgrade without such a mechanism in place would risk the customer’s data. And because rolling upgrades are performed while the cluster is in active use, we needed to address the following set of requirements with an eye toward the applications that may be accessing the file system during an upgrade:

  • A new backup mechanism that safeguards the data but works for a live rolling upgrade.
  • Support for two types of “reversals”:
    • If the admin finds minor issues during rolling upgrades, the system must support downgrade back to older binary version without loss of any newly created data or updates to data.
    • If something goes wrong in the new version of the HDFS software, the system must support HDFS’ traditional rollback to the old state without losing data created after the upgrade started.
  • The HDFS service must remain available to protect higher level applications and services from failure.
  • Preserve write performance and prevent data loss: Many clients are writing to HDFS during rolling upgrades. Existing write pipelines break when a datanode is restarted to upgrade and every datanode in the cluster will be restarted in a rolling fashion. This could have significant performance impact on applications writing data, with the risk of data loss. Both issues needed to be addressed.
  • Preserve services running HDFS client libraries to interface with HDFS. These libraries must remain compatible with the newer version.
  • Support upgrade isolation to just the master nodes (NameNode) or just the slave nodes (DataNode) in the cluster. The mixed mode of nodes running different versions is very important to rollout bug fixes only to NameNode or the DataNodes.

Yahoo and Hortonworks collaborated on the comprehensive solution for HDFS rolling upgrades that addressed those requirements. Now let us describe how it all works.

The HDFS Rolling Upgrade Process

Minimizing HDFS service downtime

Rolling upgrade functionality depends on NameNode high availability and requires NameNode failover. We made NameNode failover faster than 60 seconds to reduce service unavailability experienced by applications. We also accelerated DataNode restarts to ensure that they can be quickly upgraded without DataNode unavailability causing an adverse impact.

Pipeline pausing and other improvements

One of the most important requirements was that ongoing writes do not fail during rolling upgrades. HDFS clients that have a DataNode in the process of upgrade pause for a short duration of time when the DataNode is stopped during rolling upgrade. The DataNode is quickly restarted on the new version of the software. The client adds back the upgraded DataNode and resumes writing in the pipeline.

Without this functionality, simply restarting the DataNode on newer software could fail the write pipeline and cause an unnecessary pipeline recovery. It is also possible that all the DataNodes might be restarted and in that case if the pipeline doesn’t pause the writer application might fail. Pausing the pipeline prevents that.

New downgrade functionality

Prior to rolling upgrade, the only way to go back to an older version was to rollback. Rollback takes the cluster back to older snapshot prior to the upgrade, resulting in loss of all the data created while the upgrade was underway.

If the cluster was upgraded as a whole, this might be acceptable. But rolling upgrade is an online functionality where the HDFS nodes are upgraded a few at a time. Rolling upgrades could take up to a day for very large clusters. Losing all the data created during that time is not acceptable. Besides, rollback requires restarting the entire cluster, with resulting downtime.

HDFS includes a new mechanism called “Downgrade” to address this. It allows downgrading the software back to older version, without the loss of newly created data. Downgrade can be executed without cluster downtime. During downgrade, HDFS service remains available.

Steps to perform rolling upgrade

For upgrading a HA HDFS cluster, it only takes these four steps:

  1. Preparing Rolling upgrade: run a command to prepare the rollback image.
  2. Upgrading NameNodes: upgrade the NameNode in standby state, failover to the newly upgraded NameNode and then upgrade the other NameNode.
  3. Upgrading DataNodes: upgrade one or a few DataNodes each time, repeat until all DataNodes are upgraded.
  4. Finalizing the upgrade: run a command to finalize the rolling upgrade.

For more details on the process, please see the documentation.

Automated Rolling Upgrade via Ambari

The rolling upgrade process described above has been further simplified with the support of the automated rolling upgrade feature in Apache Ambari 2.0. The details of the Ambari work are covered in AMBARI-7804 and AMBARI-8146. This work enables fully automated rolling upgrades for HDFS deployments using the Ambari management system. Watch this blog for a detailed post on that…coming very soon.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>