Luminar is one of Hortonworks’ original customers. Apache Hadoop is a pillar of their modern data architecture, and since choosing Hortonworks in 2012, the Luminar team became expert users of Hortonworks Data Platform version 1.
They were eager to migrate to HDP2 after it launched in October 2013.
I recently spoke with Juan Manuel Alonso, Luminar’s Manager of Insights. Juan Manuel worked with the Hortonworks professional services team to plan and execute the migration from HDP1 to HDP2.
Q: Now that you’ve had a couple of months to run HDP version 2, what are the biggest differences compared to version 1?
The first thing we notice is the performance. I would estimate that this new version of Hive is at least 40% faster than the version we were running before.
We also noticed remarkable performance improvement in the connection between Hive and Tableau. Now we can work with very big datasets directly in Tableau without pre-aggregation and other workarounds that we needed to do before. This is probably because of Hive’s better support for more SQL data types like VARCHAR and DATE.
The second big difference we noticed is the new version of Ambari. With Ambari version 1.4.1, we can administer the clusters very easily. This was one of the most remarkable updates.
Monitoring the cluster is very easy. I can check on cluster status, node functionality, performance, CPU usage and memory usage.
With Ambari, it’s also very easy to manage the cluster, such as starting and stopping the services. I can use Ambari’s graphical user interface to add a new node in about twenty minutes. It’s very easy and reliable for provisioning.
We’re still exploring the benefits of YARN. We know that YARN is responsible for much of the performance improvements we’re seeing and we already know that we want to add Mahout and certain machine learning applications on the data. I think we will discover more of YARN’s benefits when we do that.
Q: Juan Manuel, how did you work with the Hortonworks professional services team to plan for such an important migration?
A: Well, first we had two 1-hour phone calls with the Hortonworks team to define the scope.
We had full confidence in the Hortonworks team, but just to be prudent, we planned for a migration during Thanksgiving, when people would be out of the office. About ten days before Thanksgiving, I met with Mike Perez for an overview of what we needed to do. We agreed on a schedule and discussed the environment and tools.
Five days before the migration, I met for another hour with Leonid Fedotov for a walkthrough of the migration process. Then I filled out a questionnaire with twenty questions. The questions were very clear and I didn’t need to read any additional documentation.
One day before the upgrade, Leonid sent me a document with step-by-step instructions on how we would do the migration.
I spent a total of three hours preparing for the migration: two one-hour meetings and another hour filling out the questionnaire.
Q: What areas did you focus on while planning the migration?
A: First, we needed to know that we wouldn’t lose any data during the migration. We wanted to know if we needed to back up any of the data.
Secondly, we wanted to know that the migration wouldn’t break any of our tools. We couldn’t lose any processing capability immediately after moving to HDP 2, and we wondered if we would need to make any changes to our Python code or MapReduce jobs after moving to version 2.
Finally, we did have a small fear that the entire system would come down as a result of the migration.
This wasn’t a big fear, but it did cross our minds. So we planned the migration on Thanksgiving Day, to give us some breathing room in case anything unexpected came up.
Q: Tell me what happened during the migration. How did you and the Hortonworks professional services team work together to get it done?
A: On the day of the migration, Leonid worked with me, according to the plan that we’d created beforehand.
I shared my desktop and we both tracked from the same plan document. He led me through each command and together we made sure that everything executed properly.
We started at about 9am Pacific Time on Thanksgiving Day. Leonid had estimated about 7-8 hours to complete 95% of the migration process, and that’s how long it took.
At the end of the day, we were sure that all of the data was OK.
We also reviewed some of the applications and tools. We executed some Hive queries and ran some of the more simple MapReduce jobs in Python. We looked at Tableau. Everything was working.
The next day, I went back to run more complete tests on our tools: more complex Hive queries, creating tables, and testing the new features in 2.0 to make sure they were working.
Everything worked, so by the second day, we knew that we would be fine to resume our full suite of workloads on Monday, without any disruption.
Q: So everything worked as expected?
A: There were only two minor issues. After the migration, Ganglia and Oozie had some troubles at first, but they didn’t stop our work. We worked with Leonid for about 2 hours and he fixed all of the Oozie issues and most of the Ganglia issues.
By Monday, I had 100% of the cluster migrated and up and running as expected.
The Hortonworks team was very well prepared and all the steps ran smoothly. The migration was very fast and efficient, and all of it was done in our production environment.
We wouldn’t have been able to migrate from Hadoop 1 to Hadoop 2 on our own. It required deep understanding of the platform, which the Hortonworks team shared with me during the migration.
Q: How about support? Did you work with the Hortonworks Enterprise Support team during the migration?
A: The Hortonworks Support team was standing by to prioritize any issues that may have been outstanding by the Monday after the migration.
As it happened, we didn’t need to call on them, but it helped to know that they were ready if needed.
Thanks to Juan-Manuel for taking the time to share his thoughts on the migration process. Learn more about Luminar’s use of HDP here.