HOW TO: Connect Tableau to Hortonworks Sandbox

Tableau, Apache Hive and the Hortonworks Sandbox

As with most BI tools Tableau can use Apache Hive (via ODBC connection) as the defacto standard for SQL access in Hadoop. Establishing a connection from Tableau to Hadoop and the Hortonworks Sandbox is fairly straightforward and we will describe the process here.

1. Install Tableau

To get started, please download and install Tableau from their web site . Tableau is a Windows only application.

2. Install & Configure Windows 32bit ODBC driver

Once Tableau is installed you will need to go to download the Windows 32bit ODBC driver here.

Once the driver is installed you need to configure the driver by executing the DriverConfiguration32.exe utility. You can find it from the Start menu or you can search for it from the Start page.  Configure the Hive Server type to 2 and then set Authentication Mechanism to “User Name” and the User Name to “sandbox”.


3. Connect to Hadoop as Data Source

Start the Tableau application and choose the Select Connect to Data from the Data Menu.


Tableau will present a menu that shows various data source options. In the left panel select the “Hortonworks Hadoop Hive” server option.


Tableau will then present the “Hortonworks Hadoop Hive Connection” configuration dialog. Enter the IP address of the Sandbox VM (typically and then  click the “Connect” to establish the connection.


Now set the Schema to default. Then Go to tables and click on the spyglass icon. You will get a list of tables in Hive.


Select the tables you would like to use (we use tweetsbi) and click “OK” at the bottom of the dialog.  This the data will be imported.


You have the option of using a Live connection where the data is imported as you need it or to import some or all the data at once. Choose your option.


4. Visualize

Once the data is imported you are ready to go.  Now you can use Tableau to visualize data in Hadoop and the Hortonworks Sandbox.



You can find many more tutorials to explore with the Hortonworks Sandbox!

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.