Hadoop has always been associated with BigData, yet the perception is it’s only suitable for high latency, high throughput queries. With the contribution of the community, you can use Hadoop interactively for data exploration and visualization. In this tutorial you’ll learn how to analyze large datasets using Apache Hive LLAP on Amazon Web Services (AWS) through the use of Business Intelligence (BI) tools like Tableau.
Hortonworks Data Cloud (HDCloud) for AWS is a platform for analyzing and processing data, enabling businesses to achieve insights more quickly and with greater flexibility than ever before. Instead of going through infinite configuration options, you can choose from a set of prescriptive cluster configurations and you can start modeling and analyzing your data sets in minutes. When you are done with your analysis, you can give the resources back to the cloud, reducing your costs. Get Started with general instructions on how to launch a cloud controller instance and create cluster(s).
A few cluster modifications are required when creating a cluster to take advantage of Hive LLAP. After you login to AWS, search or go to the CloudFormation service:
From the CloudFormation service screen, Open CloudURL:
Login using your credentials:
Create a cluster as you normally do, with the exception of:
Settings should look similar to:
Once the cluster is created and running, login Ambari:
Now, we’ll go into the Hive View 2.0 and create the necessary tables for our analysis:
Cut-Paste DDL into the Hive Query Editor and execute:
Cut-Paste the following command into the Hive Query Editor to reload all the partitions from S3. We now have some data to play with:
msck repair table flights;
Read documentation for details on connecting using Tableau.
We’ll need to know the master’s node IP address so we can connect to it via Tableau. Capture the IP address from Ambari:
Startup Tableau and connect using “Hortonworks Hadoop Hive” server. If the server is not displayed, click on “More…”:
Connect to Hortonworks Hadoop Hive using the master’s node IP address.
The settings look like:
Now that we’re connected with Tableau, we’ll select the “hwxdemo” schema for our analysis.
We’ll create our data model using the following 4 tables:
We create our data model using the following tables loaded in order with associations as described below:
Uniquecarrier = Code
Tailnum = Tailnum(Planes)
Origin = Iata
Dest = Iata(Airports)
The final table associations should look like this:
Congratulations! You’ve completed the tutorial.
As seen by this tutorial, it is easy to use Hive LLAP on Amazon Web Services (AWS). In using Business intelligence tools (BI tools), like Tableau, you are able to interactively explore and analyze your data.