HOW TO: Connect Tableau to Hortonworks Sandbox

Tableau, Apache Hive and the Hortonworks Sandbox

As with most BI tools Tableau can use Apache Hive (via ODBC connection) as the defacto standard for SQL access in Hadoop. Establishing a connection from Tableau to Hadoop and the Hortonworks Sandbox is fairly straightforward and we will describe the process here.

1. Install Tableau

To get started, please download and install Tableau from their web site . Tableau is a Windows only application.

2. Install & Configure Windows 32bit ODBC driver

Once Tableau is installed you will need to go to download the Windows 32bit ODBC driver here.

Once the driver is installed you need to configure the driver by executing the DriverConfiguration32.exe utility. You can find it from the Start menu or you can search for it from the Start page.  Configure the Hive Server type to 2 and then set Authentication Mechanism to “User Name” and the User Name to “sandbox”.

t01

3. Connect to Hadoop as Data Source

Start the Tableau application and choose the Select Connect to Data from the Data Menu.

t02

Tableau will present a menu that shows various data source options. In the left panel select the “Hortonworks Hadoop Hive” server option.

t03

Tableau will then present the “Hortonworks Hadoop Hive Connection” configuration dialog. Enter the IP address of the Sandbox VM (typically 192.168.56.101) and then  click the “Connect” to establish the connection.

t04

Now set the Schema to default. Then Go to tables and click on the spyglass icon. You will get a list of tables in Hive.

t05

Select the tables you would like to use (we use tweetsbi) and click “OK” at the bottom of the dialog.  This the data will be imported.

t06

You have the option of using a Live connection where the data is imported as you need it or to import some or all the data at once. Choose your option.

t07

4. Visualize

Once the data is imported you are ready to go.  Now you can use Tableau to visualize data in Hadoop and the Hortonworks Sandbox.

t08

 

You can find many more tutorials to explore with the Hortonworks Sandbox!

Try these Tutorials

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.