How To Migrate Your Hadoop Cluster to Hortonworks Data Platform 2.0

Now that Hortonworks Data Platform 2.0 is GA, you may be looking to migrate your Hadoop stack from another version to take advantage of Hadoop 2’s YARN-based architecture. Fortunately, our Professional Services & Support teams are getting a lot of practice at migration from other distributions as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform.

While any specific migration may have a few gotchas from a vendor lock-in, or business integration perspective, this high-level process overview is battle tested on large-scale production clusters and we hope it helps you plan for your own migration.

Essential Migration Path

Depending on the source of your existing distribution, or intent, these are some obvious candidates for migration.

migrate1

Essential Migration Steps

A Hadoop distribution has multiple Apache components, and possibly some vendor-specific components. This graphic shows best practice for the order in which to migrate the various components. The Hortonworks services team has automated some of the migration steps to simplify the process.

migrate2

Risks and Mitigations

There are risks associated with any data center migration. Here are two key risks and their essential mitigations.

RISK: Data Loss.

HDFS is very stable and reliable, and we’ve not seen any data loss in actual migrations. Use proactive measures such as config and fsimage backup to keep the risk at a minimum. To avoid data loss, use safemode for checks and balances at each step of the upgrade.

RISK: Application Regression Issues

Early testing in development and test environments can help identify and implement config and code changes, with a final series of tests prior to production migration. Also refer to this guide on running existing applications on Hadoop 2 YARN.

Here to Help

Hadoop migrations benefit from practice, and we’re getting good at them as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform. In fact this essential guidance has been used with many customers to migrate thousands of nodes already this year.  Of course, we anticipate that this will increase significantly with the availability of the YARN-based Hortonworks Data Platform 2.0.

We hope this brief guide helps you plan for your own migration. For specific migration questions, feel free to contact Hortonworks. Or visit our website to find out more about Hortonworks Data Platform 2.0

Categorized by :
Administrator Architect & CIO Architecture Hadoop in the Enterprise HDP HDP 1.x HDP 2

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.

Thank you for subscribing!