How To Capitalize on Clickstream data with Hadoop
In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:
- What is the most efficient path for a site visitor to research a product, and then buy it?
- What products do visitors tend to buy together, and what are they most likely to buy in the future?
- Where should I spend resources on fixing or enhancing the user experience on my website?
In the Hortonworks Sandbox, you can run a simulation of website Clickstream behavior to see where users are located and what they are doing on the website. This tutorial provides a dataset of a fictitious website and the behavior of the visitors on the site over a 5 day period. This is a 4 million line dataset that is easily ingested into the single node cluster of the Sandbox via HCatalog.
In this tutorial, you’ll also learn how to combine datasets. Once you have the Clickstream data in the Sandbox, you’ll then combine it with the two other data sets provided: User Data along with Product data. This combination of data is easily achieved using Hive.
Once you have these combined data sets, then you can use a visualization tool to see where the customer are, what products they are looking at. In this tutorial, we show you how to do this in Excel, but you could easily do this in Tableau, Alterx or an Open Source tool like BIRT.
Once you’ve completed the tutorial, you can easily add your own data sets to see how your own customers move through your website and start capitalizaing on each minute you have.
Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.