cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
August 30, 2013
prev slideNext slide

How To Capitalize on Clickstream data with Hadoop

In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:

  • What is the most efficient path for a site visitor to research a product, and then buy it?
  • What products do visitors tend to buy together, and what are they most likely to buy in the future?
  • Where should I spend resources on fixing or enhancing the user experience on my website?

In the Hortonworks Sandbox, you can run a simulation of website Clickstream behavior to see where users are located and what they are doing on the website. This tutorial provides a dataset of a fictitious website and the behavior of the visitors on the site over a 5 day period. This is a 4 million line dataset that is easily ingested into the single node cluster of the Sandbox via HCatalog.

02_omniturelog_raw

In this tutorial, you’ll also learn how to combine datasets. Once you have the Clickstream data in the Sandbox, you’ll then combine it with the two other data sets provided: User Data along with Product data. This combination of data is easily achieved using Hive.

09_webloganalytics_query

Once you have these combined data sets, then you can use a visualization tool to see where the customer are, what products they are looking at. In this tutorial, we show you how to do this in Excel, but you could easily do this in Tableau, Alterx or an Open Source tool like BIRT.

28_category_by_color_florida

Once you’ve completed the tutorial, you can easily add your own data sets to see how your own customers move through your website and start capitalizaing on each minute you have.

Don’t have the Sandbox? Download it here. Find more use cases for big data analytics here.

Comments

  • Respected Sir,
    I installed Hortonworks sandbox and connected to Hortonworks.but in the file browser it is showing hue instead of hdfs.
    My doubt is after uploading a file data set, then how to apply mapreduce on that data set ?

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>