cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Analyze HVAC Machine and Sensor Data

Introduction

Version 1.0 for HDP 2.5 January 25, 2017

Hortonworks Data Platform (HDP) can be used to to refine and analyze data from heating, ventilation, and air conditioning (HVAC) systems to maintain optimal office building temperatures and minimize expenses.

Demo video Enable Predictive Analytics with Hadoop illustrates what you’ll be building in this tutorial.

Sensor Data

A sensor is a device that measures a physical quantity and transforms it into a digital signal. Sensors are always on, capturing data at a low cost, and powering the “Internet of Things.”

Use of Sensor Data

In this tutorial, we will focus on sensor data from building operations. Specifically, we will refine and analyze the data from Heating, Ventilation, Air Conditioning (HVAC) systems in 20 large buildings around the world.

Prerequisites:

Tutorial Series Overview

In this tutorial, we work with HVAC sensor data from 20 buildings from different countries. Learn a method to refine and analyze HVAC data:

Lab1 – Upload and Refine Data with Hive

  • Download and extract the sensor data files to your local machine.
  • Upload and cleanse HVAC data with Apache Hive running on HDP sandbox.

Lab 2 – Visualize HVAC Data Via Zeppelin

  • Access the refined HVAC data.
  • Visualize HVAC data to determine buildings that need HVAC units.

Lab 1 - Upload and Refine Data with Hive

Introduction

You will learn to refine HVAC data into a useful format. You will gain a practical experience with creating Hive tables that run on ORC files for fast and efficient data processing. You will gain insight to writing Hive scripts that enrich our data to reveal to us when temperature is at a cold, normal or hot state. Additionally, you will learn to write Hive queries on the data to determine which particular buildings are associated with these temperature states.

Prerequisites

Read through the Introduction

Download and Extract the Sensor Data Files

1. Download the sample sensor data contained in a compressed (.zip) folder here:

2. Save the SensorFiles.zip file to your computer, then extract the files. You should see a SensorFiles folder that contains the following files:

  • HVAC.csv – contains the targeted building temperatures, along with the actual (measured) building temperatures. The building temperature data was obtained using Apache Flume. Flume can be usedas a log aggregator, collecting log data from many diverse sources and moving it to a centralized data store. In this case, Flume was used to capture the sensor log data, which we can now load into the Hadoop Distributed File System (HFDS). For more details on Flume, refer to Tutorial 13: Refining and Visualizing Sentiment Data

  • building.csv – contains the “building” database table. Apache Sqoop can be used to transfer this type of data from a structured database into HFDS.

Outline

Step 1: Upload HVAC Data To Hive Table

1. Login to Ambari login at http://localhost:8080

  • username: maria_dev
  • password maria_dev

2. Use the dropdown menu in the top right corner to access the Hive view:

Navigate to the Upload Table tab, upload the two csv files contained within SensorFiles.zip

When uploading the two tables we’ll need to change a few things.

For HVAC.csv

1. Change the table name to hvac_raw
2. Change the name of the Date column to date_str
3. If TargetTemp's type is STRING, change it to INT
4. If ActualTemp's type is STRING, change it to INT

As the table uploads, you’ll notice a progress window appear.

For buildings.csv

1. Change the table name to building_raw

We have both tables loaded in, we want to get better performance in Hive, so we’re going to create new tables that utilize the highly efficient ORC file format. This will allow for faster queries when our datasets are much much larger.

Step 2: Create HVAC and Buildings ORC Hive Tables

1. Create a new table hvac that is stored as an ORC file.

CREATE TABLE hvac STORED AS ORC AS SELECT * FROM hvac_raw;

2. Use a similar Hive query to create the buildings table.

CREATE TABLE buildings STORED AS ORC AS SELECT * FROM building_raw;

Step 3: Enrich Sensor Data Via Hive Scripts

We will use two Hive scripts to refine the sensor data. The knowledge we seek to acquire from this data:

  • Reduce heating and cooling expenses.
  • Keep indoor temperatures in a comfortable range between 65-70 degrees.
  • Identify which HVAC products are reliable, and replace unreliable equipment with those models.

We will identify whether the actual temperature was more than five degrees different from the target temperature with a Hive script.

1. Create a new worksheet in the Hive view and paste the following Hive query into the editor:

CREATE TABLE hvac_temperatures as
select *, targettemp - actualtemp as temp_diff,
IF((targettemp - actualtemp) > 5, 'COLD',
IF((targettemp - actualtemp) < -5, 'HOT', 'NORMAL'))
AS temprange,
IF((targettemp - actualtemp) > 5, '1',
IF((targettemp - actualtemp) < -5, '1', 0))
AS extremetemp from hvac;

What’s this query does?

  • Creates a new table hvac_temperatures and copies data from the hvac table

2. Use Execute to create the new table.

  • On the Query Results page, use the slider to scroll to the right. You will notice that two new attributes appear in the hvac_temperatures table.

What are the two new attributes?

  • temprange and extremetemp

The data in the temprange column indicates whether the actual temperature was:

  • NORMAL within 5 degrees of the target temperature.
  • COLD more than five degrees colder than the target temperature.
  • HOT more than 5 degrees warmer than the target temperature.

If the temperature is outside of the normal range, extremetemp is assigned a value of 1; otherwise its value is 0.

3. Let’s combine the hvac and hvac_temperatures data sets.

Create a new worksheet in the hive view and use the following query to create a new table hvac_building that contains data from the hvac_temperatures table and the buildings table.

create table if not exists hvac_building
as select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr
from buildings b join hvac_temperatures h on b.buildingid = h.buildingid;

Which tables is hvac_building’s data coming from?

  • hvac_temperature and buildings

Use Execute to run the query:

After you’ve successfully executed the query, use the database explorer to load a sample of the data from the new hvac_building table.

The aggregated data shows us the buildings that are associated with a certain temprange.

Summary

We’ve successfully refined the data into a useful format. We learned to create Hive tables that run on ORC files for fast and efficient data processing. We learned to write Hive scripts to enrich our data to reveal to us when temperature is at a cold, normal or hot state. Additionally, we used the data to bring us insight into which particular buildings are associated with these temperature states. Our next step is to use different reporting tools to analyze the results.

Further Reading

Lab 2 - Visualize HVAC Data Via Zeppelin

Introduction

You will learn to gain more insights on your data by visualizing certain key attributes using Apache Zeppelin. You will to learn to visualize a graph that shows the countries that have the most extreme temperature and the amount of NORMAL events there are compared to HOT and COLD. You will also learn to see which HVAC units result in the most extremetemp readings.

Prerequisites

  • Lab 1 – Upload and Refine Data with Hive

Outline

Apache Zeppelin

Apache Zeppelin is data reporting tool on Hadoop. Zeppelin integrates with Hadoop by using tools called interpreters. It has direct connections to Apache Spark and Hive in your cluster and allows you to create visualizations and analyze your data on the fly.
Apache Zeppelin gives you the power to connect right to your Hadoop cluster to quickly obtain results from the data inside of Hadoop without having to export data to any other sources.

It’s also important to note that Zeppelin contains several interpreters that can be utilized to obtain data in a variety of ways.
One of the default interpreters included with Zeppelin is for Apache Spark. With the popularity of Apache Spark rising, you can simply write Spark scripts to execute directly on Apache Zeppelin to obtain results from your data in a matter of seconds.

Step 1: Visualize Temperature Ranges of Countries

1. Access Zeppelin at localhost:9995

From here we’re going to need to create a new Zeppelin Notebook. Notebooks in Zeppelin is how we differentiate reports from one another.

2. Hover over Notebook. Use the dropdown menu and Create a new note.

3. Name the note HVAC Analysis Report and then Create Note.

We will use the Hive interpreter to run Hive queries and visualize the results in Zeppelin.

4. To access the Hive interpreter for this note, we must insert %jdbc(hive) at the top of the note. Everything afterwards will be interpreted as a Hive query.

5. Type the following query into the note, then run it by clicking the Run arrow or by using the shortcut Shift+Enter.

%jdbc(hive)

select country, extremetemp, temprange from hvac_building

Now that the query is run, let’s visualize the data with a chart.

6. Select the bar graph chart button located just under the query.

7. Click settings to open up more advanced settings for creating the chart.

8. Here you will experiment with different values and columns to customize data that is illustrated in your visualization.

  • Arrange the fields according to the following image.
  • Drag the field temprange into the groups box.
  • Click SUM on extremetemp and change it to COUNT.
  • Make sure that country is the only field under Keys.

You’ve just customized your chart’s settings to portray the countries and their temperature from cold, normal to hot using Apache Zeppelin.

  • From the chart above we can see which countries have the most extreme temperature and how many NORMAL events there are compared to HOT and COLD.

Step 2: Determine Which Buildings Need HVAC Upgrades

Is it possible to figure out which buildings might need HVAC upgrades, and which do not? Let’s determine that answer in the steps ahead…

  • Let’s try creating one more note to visualize which types of HVAC systems result in the least amount of extremetemp readings.

9. Paste the following query into the blank Zeppelin note following the chart we made previously.

%jdbc(hive)

select hvacproduct, extremetemp from hvac_building

10. Use Shift+Enter to run the note.

11. Arrange the fields according to the following image so we can recreate the chart below.

  • Make sure that hvacproduct is in the Keys box.
  • Make sure that extremetemp is in the Values box and that it is set to COUNT.

  • Now we can see which HVAC units result in the most extremetemp readings. Thus we can make a more informed decision when purchasing new HVAC systems.

Summary

We’ve successfully gained more insights on our data by visualizing certain key attributes using Apache Zeppelin. We learned to visualize a graph that shows the countries that have the most extreme temperature and the amount of NORMAL events there are compared to HOT and COLD. We learned to see which HVAC units result in the most extremetemp readings.

Further Reading