newsletter

Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
HDP > Develop with Hadoop > Real World Examples

Visualize Website Clickstream Data

Visualize Log Data with Apache Zeppelin

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

In this section, we will use Apache Zeppelin to access refined clickstream data.

Prerequisites

Outline

If you don’t have access to Microsoft Excel Professional Plus, you can also utilize Apache Zeppelin to do you data visualization.

Import the Zeppelin Notebook

Great! you have met the requirements and are ready to begin (If at any point you have any issues, make sure to checkout the Getting Started with Apache Zeppelin tutorial).

To import the notebook, go to the Zeppelin home screen.

1. Click Import note

2. Select Add from URL

3. Copy and paste the following URL into the Note URL

https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/hdp/visualize-website-clickstream-data/assets/ClickstreamAnalytics.json

4. Click on Import Note

Once your notebook is imported, you can open it from the Zeppelin home screen by:

5. Clicking ClickstreamAnalytics

Once the ClickstreamAnalytics notebook is up, follow all the directions within the notebook to complete the tutorial.

Identify the State with the Most Customers

Let’s take a look at the first graph in the notebook. Take note of the following:

  1. The code in the paragraph that is run
  2. The fields that are visualized (click “settings” to open this panel)
  3. The type of graph rendered

zeppelin-states-graph

Understand Customer Demographics

Scroll down and check out the next section with a graph. Let’s dive a bit deeper and see how we achieve the visualizion.

1. Write the query to filter demographics (age, gender, category)

%jdbc(hive)
select age, gender_cd, category from webloganalytics where age is not NULL LIMIT 1000

2. Open settings, make sure

  • age is dragged into the Keys area,
  • gender_cd is dragged into Groups area,
  • category COUNT is dragged into Values area

3. Select area chart as the visualization.

Those steps produce the following:

Zeppelin Demographics Graph

The majority of users who visit the website are within age range of 20-30. Additionally, there seems to be an even split between both genders.

Analyze Interest Category Distribution

Finally, let’s check out the last graph in this notebook. It looks like clothing is clearly the most popular reason customers visit the website.

Zeppelin Category Graph

Summary

You have successfully analyzed and visualized log data with Apache Zeppelin. This, and other tools can be used with the Hortonworks Data Platform to derive insights about customers from various data sources.

The data stored in the Hortonworks Data Platform can be refreshed frequently and used for basket analysis, A/B testing, personalized product recommendations, and other sales optimization activities.

Further Reading

User Reviews

User Rating
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Tutorial Name
Visualize Website Clickstream Data

To ask a question, or find an answer, please visit the Hortonworks Community Connection.

No Reviews
Write Review

Register

Please register to write a review

Share Your Experience

Example: Best Tutorial Ever

You must write at least 50 characters for this field.

Success

Thank you for sharing your review!