How To Migrate Your Hadoop Cluster to Hortonworks Data Platform 2.0
Now that Hortonworks Data Platform 2.0 is GA, you may be looking to migrate your Hadoop stack from another version to take advantage of Hadoop 2’s YARN-based architecture. Fortunately, our Professional Services & Support teams are getting a lot of practice at migration from other distributions as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform.
While any specific migration may have a few gotchas from a vendor lock-in, or business integration perspective, this high-level process overview is battle tested on large-scale production clusters and we hope it helps you plan for your own migration.
Essential Migration Path
Depending on the source of your existing distribution, or intent, these are some obvious candidates for migration.
Essential Migration Steps
A Hadoop distribution has multiple Apache components, and possibly some vendor-specific components. This graphic shows best practice for the order in which to migrate the various components. The Hortonworks services team has automated some of the migration steps to simplify the process.
Risks and Mitigations
There are risks associated with any data center migration. Here are two key risks and their essential mitigations.
RISK: Data Loss.
HDFS is very stable and reliable, and we’ve not seen any data loss in actual migrations. Use proactive measures such as config and fsimage backup to keep the risk at a minimum. To avoid data loss, use safemode for checks and balances at each step of the upgrade.
RISK: Application Regression Issues
Early testing in development and test environments can help identify and implement config and code changes, with a final series of tests prior to production migration. Also refer to this guide on running existing applications on Hadoop 2 YARN.
Here to Help
Hadoop migrations benefit from practice, and we’re getting good at them as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform. In fact this essential guidance has been used with many customers to migrate thousands of nodes already this year. Of course, we anticipate that this will increase significantly with the availability of the YARN-based Hortonworks Data Platform 2.0.
We hope this brief guide helps you plan for your own migration. For specific migration questions, feel free to contact Hortonworks. Or visit our website to find out more about Hortonworks Data Platform 2.0