In this section, we will use Apache Zeppelin to access refined clickstream data.
- Have sample retail data already loaded by completing this tutorial
- Import a Notebook into Apache Zeppelin
- Identify the State with the Most Customers
- Understand Customer Demographics
- Analyze Interest Category Distribution
- Further Reading
Import a Notebook into Apache Zeppelin
If you don’t have access to Microsoft Excel Professional Plus, you can also utilize Apache Zeppelin to do you data visualization as well.
Open up Ambari and make sure Zeppelin is running. As shown in the screenshot below, use the “Quick Links” dropdown menu to access the Zeppelin UI.
Once the Zeppelin UI is open, click on “Import note“.
Import ClickstreamAnalytics.json, which you can find here: ClickstreamAnalytics.json.
Once Zeppelin opens up, click on the correct icon in the navigation bar to display the code that goes along with the visualized data. See the following screenshot for this icon’s location.
Identify the State with the Most Customers
Let’s take a look at the first graph in the notebook. Take note of the following:
- The code in the paragraph that is run
- The fields that are visualized (click “settings” to open this panel)
- The type of graph rendered
Understand Customer Demographics
Scroll down and check out the next section with a graph. Let’s dive a bit deeper and see how we achieve the visualizion.
- Write the query to filter demographics (age, gender, category)
%jdbc(hive) select age, gender_cd, category from webloganalytics where age is not NULL LIMIT 1000
Open settings, make sure
ageis dragged into the Keys area,
gender_cdis dragged into Groups area,
category COUNTis dragged into Values area
area chartas the visualization.
Those steps produce the following:
The majority of users who visit the website are within age range of 20-30. Additionally, there seems to be an even split between both genders.
Analyze Interest Category Distribution
Finally, let’s check out the last graph in this notebook. It looks like clothing is clearly the most popular reason customers visit the website.
You have successfully analyzed and visualized log data with Apache Zeppelin. This, and other BI tools can be used with the Hortonworks Data Platform to derive insights about customers from various data sources.
The data stored in the Hortonworks Data Platform can be refreshed frequently and used for basket analysis, A/B testing, personalized product recommendations, and other sales optimization activities.