Hadoop Tutorials: Real Life Use Cases in the Sandbox

One of the goals with the Hortonworks Sandbox is around showcasing end-to-end use cases for Hadoop. With the most current release of Hadoop tutorials, you’ll find 2 specific use cases highlighted both around utilizing clickstream data.   There are 6 new tutorials for you to walk through – Tutorials 6 – 11.

(Update: if your version of Sandbox does not have “Enable Ambari” on the introductory page, you will need to download the latest version of the Sandbox in order to have access to these tutorials.)

Clickstream Analysis – Website User Behavior


Hadoop Tutorials

Hadoop Tutorials in Hortonworks Sandbox

Tutorials 6-10 are extensive, step-by-step lessons to walk you through the process to connect the Sandbox to Excel 2013 via the Hortonworks ODBC driver to access and analyze semi-structured data (like Omniture logs). Here are some highlights of the new tutorials:

Tutorial 6 – Loading Data into the Hortonworks Sandbox

This covers the basics of brining data into the Sandbox. In this example, we’ve provided access to anonymized Omniture logs. But you can bring in your own data into the Sandbox – your own log data, twitter feeds, etc. The Sandbox is a fully functional personal Hadoop environment where you can add your own datasets to validate the Hadoop use cases in your environment.

Tutorials 7 & 11 – Installing the ODBC Driver in the Hortonworks Sandbox (Windows and Mac)

You can download the Hortonworks ODBC driver, connect it to the Sandbox and then use that connection with your favorite visualization or business intelligence tool? This tutorial will help you with the set up and connection. Once it’s set up, connect to Excel, Tableau, Alteryx, or any other business intelligence tool that supports ODBC.

Tutorials 8 & 9 – Accessing and Analyzing Data in Excel

Imagine being able to take that semi-structured data from Tutorial 6 and surface it in Excel. You’ll be able to do that on your own laptop when you follow the step-by-step lessons in Tutorials 8 & 9.

Hadoop Tutorials with Excel

Data visualization in Excel

Tutorial 10 – Visualizing Clickstream Data

Hadoop Tutorials

Combining CRM and weblog data

Here you will see another end-to-end example of visualizing clickstream data – but in this case weblog data is combined with CRM data to visualize actual customer behavior. This tutorial assumes that you’ve got the ODBC driver and Excel 2013 installed. Even if you don’t have Excel 2013, you can use your favorite visualization tool to play with the dataset.


With these new tutorials, you can easily work with your own data within the Sandbox to start seeing where you can use the Hortonworks Data Platform within your organization to find insights into your own business. If you are looking for publicly available data to use with the Sandbox to apply these Hadoop tutorials against, here are some suggestions:

Ready to do work on your own real-life example? Download the Sandbox now.

Categorized by :
Hadoop Sandbox

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.