Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
March 26, 2018
prev slideNext slide

How Migrates Big Datasets to the Cloud

We’re less than a month away from DataWorks Summit Berlin (April 16-19)! We have a number of impressive keynote and breakout speakers lined up. Two of these speakers are Adrian Woodhead, Principal Engineer and Elliot West, Senior Engineer, at within the Data Processing and Warehousing track. is an affiliate of Expedia Inc. and is a website for booking hotel rooms online and by telephone. The company has 85 websites in 34 languages, and lists over 325,000 hotels in approximately 19,000 locations. Its inventory includes everything from international chains and all-inclusive resorts to local favorites and bed & breakfasts, condos and other types of commercial lodging. The website provides all the information needed to book the perfect stay.

The title of’s breakout session is “Tools and Approaches for Migrating Big Datasets to the Cloud.” The presentation will highlight the journey taken by the big data platform team when tasked with migrating big data sets and pipelines from on-premises clusters to cloud based platforms. This includes two open source tools that the team built to overcome the unexpected challenges it faced.

From the breakout session abstract:

“The first of these tools is Circus Train—a dataset replication tool that copies Hive tables between clusters and clouds. The second tool is Waggle Dance—a federated Hive query service that enables querying of data stored across multiple Hive metastores. Giving real world examples, we will describe how we’ve used these tools to successfully build a petabyte scale platform that is now also being used by other brands within the Expedia organisation.

In the hospitality industry, building a 360-view of the customer is crucial. This enables organizations to interact with customers across multiple channels. Organizations use predictive analytics to glean information from their data to find connections and relationships in customer behavior, improve processes to more closely align with buyer patterns, and ultimately improve customer experiences. is looking forward to attending DataWorks Summit and interacting with its peers:

“’s data teams are engaged in an epic migration journey moving our on-premises data processing to the cloud. Along the way we’ve learnt a lot and developed tools that have proven very useful. Our hope is that by open sourcing these and presenting them at the DataWorks Summit, we can encourage others in the big data community to join us by contributing code, ideas, comments and constructive criticism. We hope to engage with other cloud-bound travelers attending the summit and share war stories, good experiences and hopefully find common patterns and approaches that make all our lives easier.”

Be sure to check out’s session to learn about what technologies are in place and how the business continues to grow its Big Data journey. The session goal is to assist others in the early part of their journey to building a solid foundation. This will definitely be a breakout session you won’t want to miss! Register now and view all the abstracts here.



Big Data Hadoop Online Training says:

hai sir your thinking so good your idea hatesup.

Leave a Reply

Your email address will not be published. Required fields are marked *