Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
HDP > Develop with Hadoop > Hello World

Hadoop Tutorial – Getting Started with HDP

Data Reporting with Microsoft Excel for Windows

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Data Reporting with Excel

Introduction

In this tutorial, using Microsoft Excel 2016 for Windows and Power View, we’ll visulaize data from previous sections of this tutorial. You may use other Businiess Intellegence (BI) tools of your choice.

Prerequisites

The tutorial is a part of series of hands on tutorial to get you started on HDP using Hortonworks sandbox. Please ensure you complete the prerequisites before proceeding with this tutorial.

Outline

Access Data in Microsoft Excel

Let’s bring in data from table avg_mileage. We created this table in the Hive – Data ETL section.

1. Open a new blank workbook.

2. Select Data > From Other Sources > From Microsoft Query

excel-open-query

3. On the Choose Data Source pop-up, select the Hortonworks ODBC data source you installed previously, then click OK.

excel-choose-data-source

4. In the Query Wizard, select the avg_mileage table and add columns to the query, then click Next.

excel-choose-columns

5. For the following Query Wizard forms, accept the defaults and click Next.

excel-filter-data

excel-sort-order

On this last form, click Finish.

excel-finish

6. Excel will send a data request to Hive. When data is returned, it will ask you where to import the table. Accept the default location to import the table: current workbook, current worksheet, in cell $A$1 – click OK.

excel-import-data

We have successfully imported table avg_mileage into Excel. Now we are ready to do some visualization.

Lab5_7

Visualize Data with Microsoft Excel

We will use Power View to visulaize our data.

1. click on excel-power-view-icon. You created this icon as part of the Power View prerequisite. The default is to create Power View sheet, click OK.

2. We will create a column chart to visually describe the average miles per gallon for each truck. Select DESIGN > Column Chart > Stacked Column. You will need to stretch the chart by dragging the lower right of the chart to the full pane. You can control the amount of data you see by filtering on avgmpg and/or truckid.

excel-select-stacked-column

Lab5_9

Moving on to our next visual…

We’ll be using data from table geolocation. We created this table in the Hive – Data ETL section. We will create a geographical map describing the location of each truck. We’ll use the following query to gather driverid, city, and state from the table.

SELECT driverid, city, state FROM geolocation;

1. Select Data > New Query > From Other Sources > From ODBC

excel-new-query

2. Fill out From ODBC form as follows:

  • Data source name (DSN): <data source name you created>
  • Under Advanced options, SQL statement (optional), type: SELECT driverid, city, state FROM geolocation;
  • press OK

excel-query

3. Excel will display a sample set of the query results. Click Load to create a new sheet and import the data.

excel-query-sample

4. click on excel-power-view-icon. You created this icon as part of the Power View prerequisite. The default is to create Power View sheet, click OK.

5. We will create a map to visually describe the location of each truck. Select DESIGN > Map. You will need to stretch the chart by dragging the lower right of the chart to the full pane.

Make sure you have network connectivity because Power View uses Bing to do geocoding which translates city and state columns into map coordinates.

excel-select-map

  • Uncheck driverid. We only want to see city and state.

The finished map looks like this.

Lab5_15

Summary

Congratulations! You are able to visualize your data using Microsoft Excel.

This tutorial has shown how Hortonworks Data Platform (HDP) can store and visualize geolocation data with Microsoft Excel. There are many other Business Intelligent (BI) tools available you can use.

You can further explorer other visualization, such as plotting risk factor or miles per gallon as bar charts.

Further Reading

As a Hadoop practitioner you can choose three basic personas to build upon your skill:

Case Studies

Learn more about Hadoop through these case studies: