Hadoop Tutorials: Real Life Use Cases in the Sandbox
One of the goals with the Hortonworks Sandbox is around showcasing end-to-end use cases for Hadoop. With the most current release of Hadoop tutorials, you’ll find 2 specific use cases highlighted both around utilizing clickstream data. There are 6 new tutorials for you to walk through – Tutorials 6 – 11.
(Update: if your version of Sandbox does not have “Enable Ambari” on the introductory page, you will need to download the latest version of the Sandbox in order to have access to these tutorials.)
Clickstream Analysis – Website User Behavior
Tutorials 6-10 are extensive, step-by-step lessons to walk you through the process to connect the Sandbox to Excel 2013 via the Hortonworks ODBC driver to access and analyze semi-structured data (like Omniture logs). Here are some highlights of the new tutorials:
Tutorial 6 – Loading Data into the Hortonworks Sandbox
This covers the basics of brining data into the Sandbox. In this example, we’ve provided access to anonymized Omniture logs. But you can bring in your own data into the Sandbox – your own log data, twitter feeds, etc. The Sandbox is a fully functional personal Hadoop environment where you can add your own datasets to validate the Hadoop use cases in your environment.
Tutorials 7 & 11 – Installing the ODBC Driver in the Hortonworks Sandbox (Windows and Mac)
You can download the Hortonworks ODBC driver, connect it to the Sandbox and then use that connection with your favorite visualization or business intelligence tool? This tutorial will help you with the set up and connection. Once it’s set up, connect to Excel, Tableau, Alteryx, or any other business intelligence tool that supports ODBC.
Tutorials 8 & 9 – Accessing and Analyzing Data in Excel
Imagine being able to take that semi-structured data from Tutorial 6 and surface it in Excel. You’ll be able to do that on your own laptop when you follow the step-by-step lessons in Tutorials 8 & 9.
Tutorial 10 – Visualizing Clickstream Data
Here you will see another end-to-end example of visualizing clickstream data – but in this case weblog data is combined with CRM data to visualize actual customer behavior. This tutorial assumes that you’ve got the ODBC driver and Excel 2013 installed. Even if you don’t have Excel 2013, you can use your favorite visualization tool to play with the dataset.
With these new tutorials, you can easily work with your own data within the Sandbox to start seeing where you can use the Hortonworks Data Platform within your organization to find insights into your own business. If you are looking for publicly available data to use with the Sandbox to apply these Hadoop tutorials against, here are some suggestions:
- Million Song Dataset, official website by Thierry Bertin-Mahieux, Available: here
- US Federal Government: Open Data Initiative
- Indian Government: Open Data Initiative
- UK Government: Open Data Initiative
- Canadian Government: Open Data Initiative
Ready to do work on your own real-life example? Download the Sandbox now.