Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
HDP > Develop with Hadoop > Real World Examples

Visualize Website Clickstream Data

Visualize Log Data with Apache Zeppelin

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

In this section, we will use Apache Zeppelin to access refined clickstream data.

Prerequisites

Outline

Import a Notebook into Apache Zeppelin

If you don’t have access to Microsoft Excel Professional Plus, you can also utilize Apache Zeppelin to do you data visualization as well.

Open up Ambari and make sure Zeppelin is running. As shown in the screenshot below, use the “Quick Links” dropdown menu to access the Zeppelin UI.

Open Zeppelin UI

Once the Zeppelin UI is open, click on “Import note“.

Open Zeppelin UI

Import ClickstreamAnalytics.json, which you can find here: ClickstreamAnalytics.json.

Once Zeppelin opens up, click on the correct icon in the navigation bar to display the code that goes along with the visualized data. See the following screenshot for this icon’s location.

Open Zeppelin UI

Identify the State with the Most Customers

Let’s take a look at the first graph in the notebook. Take note of the following:

  1. The code in the paragraph that is run
  2. The fields that are visualized (click “settings” to open this panel)
  3. The type of graph rendered

Zeppelin States Graph

Understand Customer Demographics

Scroll down and check out the next section with a graph. Let’s dive a bit deeper and see how we achieve the visualizion.

  1. Write the query to filter demographics (age, gender, category)
%jdbc(hive)
select age, gender_cd, category from webloganalytics where age is not NULL LIMIT 1000
  1. Open settings, make sure

    • age is dragged into the Keys area,
    • gender_cd is dragged into Groups area,
    • category COUNT is dragged into Values area
  2. Select area chart as the visualization.

Those steps produce the following:

Zeppelin Demographics Graph

The majority of users who visit the website are within age range of 20-30. Additionally, there seems to be an even split between both genders.

Analyze Interest Category Distribution

Finally, let’s check out the last graph in this notebook. It looks like clothing is clearly the most popular reason customers visit the website.

Zeppelin Category Graph

Summary

You have successfully analyzed and visualized log data with Apache Zeppelin. This, and other BI tools can be used with the Hortonworks Data Platform to derive insights about customers from various data sources.

The data stored in the Hortonworks Data Platform can be refreshed frequently and used for basket analysis, A/B testing, personalized product recommendations, and other sales optimization activities.

Further Reading

User Reviews

User Rating
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Tutorial Name
Visualize Website Clickstream Data

To ask a question, or find an answer, please visit the Hortonworks Community Connection.

No Reviews
Write Review

Register

Please register to write a review

Share Your Experience

Example: Best Tutorial Ever

You must write at least 50 characters for this field.

Success

Thank you for sharing your review!