Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 30, 2013
prev slideNext slide

How To Capitalize on Clickstream data with Hadoop

In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:

  • What is the most efficient path for a site visitor to research a product, and then buy it?
  • What products do visitors tend to buy together, and what are they most likely to buy in the future?
  • Where should I spend resources on fixing or enhancing the user experience on my website?

In the Hortonworks Sandbox, you can run a simulation of website Clickstream behavior to see where users are located and what they are doing on the website. This tutorial provides a dataset of a fictitious website and the behavior of the visitors on the site over a 5 day period. This is a 4 million line dataset that is easily ingested into the single node cluster of the Sandbox via HCatalog.


In this tutorial, you’ll also learn how to combine datasets. Once you have the Clickstream data in the Sandbox, you’ll then combine it with the two other data sets provided: User Data along with Product data. This combination of data is easily achieved using Hive.


Once you have these combined data sets, then you can use a visualization tool to see where the customer are, what products they are looking at. In this tutorial, we show you how to do this in Excel, but you could easily do this in Tableau, Alterx or an Open Source tool like BIRT.


Once you’ve completed the tutorial, you can easily add your own data sets to see how your own customers move through your website and start capitalizaing on each minute you have.

Don’t have the Sandbox? Download it here. Find more use cases for big data analytics here.



TR RAO says:
Your comment is awaiting moderation.

Respected Sir,
I installed Hortonworks sandbox and connected to Hortonworks.but in the file browser it is showing hue instead of hdfs.
My doubt is after uploading a file data set, then how to apply mapreduce on that data set ?

Cheryle Custer says:

You can log in as root to get to the command line interface to perform MapReduce instructions. See the Hortonworks Forums for assistance.

General set up:
Login info:
User name: root
Password: hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums