Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 18, 2015
prev slideNext slide

Standards Based Packaging to Support Rolling Upgrades in HDP

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures and continue to function while the upgrading to the latest software version.

With the Hortonworks Data Platform (HDP) 2.2, we have enhanced the core platform packaging to put in place support for rolling upgrades of the HDP stack while the cluster is actively servicing users. For more details, please see here.

In order to support rolling upgrade it must be possible to have multiple versions of the Hadoop stack installed side-by-side as the cluster is rolled from one version to the next. We have taken a different approach: Rather than use proprietary packaging, we have leveraged standard packaging management approaches, using RPMs and Debian.

This standard packaging management approach enables system administrators to continue to use and rely on their existing tooling and best practices. Customers have a choice: either use Apache Ambari for automated rolling upgrades or roll their own solution (using Puppet, Chef, etc) to manage rolling upgrades.

The need for side by side deployments for Rolling Upgrades

HDP uses a structured rolling upgrade approach to provide a reliable and efficient upgrade with minimized service disruption. This rolling upgrade approach drives the need for side by side deployments.

r_up_1

 

In the Prepare phase, all steps are taken to prepare the cluster for upgrade, before HDP components are upgraded. The side-by-side install allows a new version of HDP bits to be installed in place before an upgrade is started, reducing the time and potential for failures during the upgrade process. This feature also benefits full-shutdown-and-upgrade approaches since the new bits can be installed prior to the shutdown.

Once the bits are laid down successfully, HDP components can be upgraded in a rolling fashion across the nodes in the cluster. This rolling upgrade phase requires side by side installs to function:

  • Side by side installs allow different components (and their component services) on the same node to be running different versions during the rolling upgrade process. For the very same component, its subcomponent may be running different versions on the same node. For example, for HDFS, the Namenode and the DataNode running on the same node may be at different stages of upgrade: the NN is upgraded first, before the DN is upgraded.
  • Side by side installs allow an HDP component to reference and utilize the same versions libraries from other components.
  • During the rolling upgrade, if there are issues with the upgrade, HDP provides a path to downgrade in a rolling fashion. With side by side installs, the path to downgrade is efficient and reliable.

Working with the new side by side layout

Starting with HDP 2.2, a new version of HDP will be deployed alongside the existing online version of HDP in preparation for an upgrade.

To enable this, HDP 2.2 RPM and Debian packages include the HDP version number in the name of the package. This was needed to make each RPM package of each version appear as if it is a different package all-to-gether.

This allows all HDP artifacts for a given release are deployed under a versioned directory:

/usr/hdp/$major.$minor.$patch.$build

All versions of HDP are deployed under the fixed directory path of /usr/hdp. For example, the following directory structure showcases two HDP versions deployed side by side:

{code}

├── /usr/hdp/2.2.0.0-2041/hadoop

│   ├── /usr/hdp/2.2.0.0-2041/hadoop/bin

│   ├── /usr/hdp/2.2.0.0-2041/hadoop/conf -> /etc/hadoop/conf

│   ├── /usr/hdp/2.2.0.0-2041/hadoop/lib

│   │   ├── /usr/hdp/2.2.0.0-2041/hadoop/lib/native

│   ├── /usr/hdp/2.2.0.0-2041/hadoop/libexec

│   ├── /usr/hdp/2.2.0.0-2041/hadoop/man

│   └── /usr/hdp/2.2.0.0-2041/hadoop/sbin

├── /usr/hdp/2.2.0.0-2041/hadoop-hdfs

│   ├── /usr/hdp/2.2.0.0-2041/hadoop-hdfs/bin

│   ├── /usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib

│   ├── /usr/hdp/2.2.0.0-2041/hadoop-hdfs/sbin

│   └── /usr/hdp/2.2.0.0-2041/hadoop-hdfs/webapps

├── /usr/hdp/2.2.0.0-2041/hbase

│   ├── /usr/hdp/2.2.0.0-2041/hbase/bin

│   ├── /usr/hdp/2.2.0.0-2041/hbase/conf -> /etc/hbase/conf

│   ├── /usr/hdp/2.2.0.0-2041/hbase/doc

│   ├── /usr/hdp/2.2.0.0-2041/hbase/include

│   ├── /usr/hdp/2.2.0.0-2041/hbase/lib

└── /usr/hdp/2.2.0.0-2041/zookeeper

├── /usr/hdp/2.2.0.0-2041/zookeeper/bin

├── /usr/hdp/2.2.0.0-2041/zookeeper/conf -> /etc/zookeeper/conf

├── /usr/hdp/2.2.0.0-2041/zookeeper/doc

├── /usr/hdp/2.2.0.0-2041/zookeeper/lib

├── /usr/hdp/2.2.0.0-2041/zookeeper/man

{code}

{code}

/usr/hdp/2.2.3.0-2611

├── /usr/hdp/2.2.3.0-2611/hadoop

│   ├── /usr/hdp/2.2.3.0-2611/hadoop/bin

│   ├── /usr/hdp/2.2.3.0-2611/hadoop/conf -> /etc/hadoop/conf

│   ├── /usr/hdp/2.2.3.0-2611/hadoop/lib

│   │   ├── /usr/hdp/2.2.3.0-2611/hadoop/lib/native

│   ├── /usr/hdp/2.2.3.0-2611/hadoop/libexec

│   ├── /usr/hdp/2.2.3.0-2611/hadoop/man

│   └── /usr/hdp/2.2.3.0-2611/hadoop/sbin

├── /usr/hdp/2.2.3.0-2611/hadoop-hdfs

│   ├── /usr/hdp/2.2.3.0-2611/hadoop-hdfs/bin

│   ├── /usr/hdp/2.2.3.0-2611/hadoop-hdfs/lib

│   ├── /usr/hdp/2.2.3.0-2611/hadoop-hdfs/sbin

│   └── /usr/hdp/2.2.3.0-2611/hadoop-hdfs/webapps

├── /usr/hdp/2.2.3.0-2611/hbase

│   ├── /usr/hdp/2.2.3.0-2611/hbase/bin

│   ├── /usr/hdp/2.2.3.0-2611/hbase/conf -> /etc/hbase/conf

│   ├── /usr/hdp/2.2.3.0-2611/hbase/doc

│   ├── /usr/hdp/2.2.3.0-2611/hbase/include

│   ├── /usr/hdp/2.2.3.0-2611/hbase/lib

└── /usr/hdp/2.2.3.0-2611/zookeeper

├── /usr/hdp/2.2.3.0-2611/zookeeper/bin

├── /usr/hdp/2.2.3.0-2611/zookeeper/conf -> /etc/zookeeper/conf

├── /usr/hdp/2.2.3.0-2611/zookeeper/doc

├── /usr/hdp/2.2.3.0-2611/zookeeper/lib

├── /usr/hdp/2.2.3.0-2611/zookeeper/man

{code}

With this layout, the HDFS DataNode can be upgraded first before the HBase RegionServer. The HBase RegionServer can then continue to run and continue to utilize the older version Hadoop and other dependency component libraries.

Managing the active version

While multiple HDP versions will be deployed on the cluster, each HDP component service on a specific node can have a separate active version at a given point of time. For example, on a given node for HDFS, the DataNode component service and the NameNode component service can each be of different active versions.

To manage this capability, HDP uses symlinks to point to the active version for each HDP component service.

For example, the Hadoop DataNode service and Hadoop NameNode service will have symlinks that point to the current version. This enables, during an upgrade process, for the Hadoop Namenode to be on the newer version and the Hadoop DataNode to be on the older version.

Clients and component services each have their own symlink – enabling active jobs, that were scheduled with the old clients, to continue running with the old version client even while the component services are being upgraded to the new version.

For example, to upgrade the DataNode on a single machine to the latest version:

> Stop DataNode

> hdp-select set hadoop-hdfs-datanode 2.2.3.0-2600

# Set active version to the newer version

> Start DataNode

 Utilizing existing tools and scripts

HDP is committed to enable the same repository management, install tooling and execution scripts that system administrators use to operate and manage Hadoop.

Since the packages are RPMs and Debian, ’yum’ and ‘apt-get’ are utilized to deploy each HDP component package.

HDP maintains the existing binary locations that execution scripts depend on. For example, /usr/bin/hadoop is maintained as a symlink and points to the active version’s Hadoop binary.

 /usr/bin/hadoop -> <active version>

Let’s look at Apache Hadoop for example.

Libraries

Hadoop component libraries are no longer found in “/usr/lib/hadoop/” . Now, each Hadoop components’ libraries are referenced through the corresponding directory:

/usr/hdp/current/hadoop-hdfs-namenode/

/usr/hdp/current/hadoop-yarn-resourcemanager

… and so on for each Hadoop component.

For example, you will find the MapReduce examples jar in:

/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar

Daemon scripts

/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-deamon.sh

/usr/hdp/current/hadoop-yarn-resourcemanager/sbin/yarn-daemon.sh

/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh

Configuration files

Configuration files can be placed /etc/hadoop/conf

Bin Scripts

/usr/bin/hadoop -> /usr/hdp/current/hadoop-client/bin/hadoop

Conclusion

Are you thinking of upgrading your HDP cluster? Try rolling upgrades. The enhanced packaging sets the stage for rolling upgrades for the entire HDP stack, while maintaining the support of packaging management tooling that Enterprise system administrators rely on.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>