New in HDP2: Encrypted communication with Hive between Hadoop and Analytics Tools

Security is one of the biggest topics in Hadoop right now. Historically Hadoop has been a back-end system accessed only by a few specialists, but the clear trend is for companies to put data from Hadoop clusters in the hands of analysts, marketers, product managers or call center employees whose numbers could be in the hundreds or thousands. Data security and privacy controls are necessary before this transformation can occur. HDP2, through the next release of Apache Hive introduces a very important new security feature that allows you to encrypt the traffic that flows between Hadoop and popular analytics tools like Microstrategy, Tableau, Excel and others.

This blog will explore this topic in more detail, as well as show you how you can configure this feature and try it out for yourself today.

Architecture of Hadoop Usage

Analytics tools like Tableau execute queries on Hadoop through a component called HiveServer2. HiveServer2 provides ODBC and JDBC connectivity to Hadoop and effectively serves as a gateway through which SQL queries are routed to Hadoop. This makes HiveServer2 a convenient single point that, when secured, ensures data privacy to analytics users.

sec1

Since its beginning, HiveServer2 has offered options for authentication to secure clusters via Kerberos, as well as the ability to run Hive queries as the authenticated user (the so-called doas feature). Until now, however, all communication between the Hadoop cluster has been unencrypted. This was a problem for anyone who needed to expose sensitive data outside their secure environment.

sec2

One customer that felt this pain very keenly was Yahoo, who are busily deploying BI tools to their analysts. The lack of encryption would be a show-stopper in Yahoo to the extent that Arup Malakar and Chris Drome from Yahoo implemented HIVE-4991, adding SASL QoP support to HiveServer2, allowing encryption to be required through a server-side variable.

sec3

Try it yourself

If your interest is piqued you can try it for yourself today and might even be surprised to find that you can be up and running in just a few minutes. HiveServer2 encryption is included as part of our HDP 2.0 Beta, so you can try it for yourself.

Here’s how you can try it out for yourself:

Step 1: Install HDP in secure mode.

A kerberized cluster is required for this feature. Visit hortonworks.com/download to get started with HDP 2.0 Beta. Of course, the feature will also be part of HDP 2.0 GA and future 2.x versions of HDP.

Step 2: Configure HDP to negotiate encrypted connections.

We’ll use Ambari to make this configuration change, so start by logging in to Ambari.

Step 2.1: Select the Hive/HCat service under Services.

sec4

Step 2.2: Stop Hive. This is necessary to make configuration changes.

sec5

Step 2.3: Confirm Stopping Hive.

sec6

Step 2.4: When Hive is stopped, select OK.

sec7

Step 2.5: Configs tab.

sec8

Step 2.6: Select Custom hive-site.xml

sec9

Step 2.7: Select Add Property…

sec10

Step 2.8: Enter Key hive.server2.thrift.sasl.qop, Value auth-conf.

sec11

Step 2.9: Save the new property.

sec12

Step 2.10: Start Hive.

sec13

Step 3: Install and configure the Hortonworks Hive ODBC Driver.

Download the Hortonworks Hive ODBC Driver from our add-ons page and follow the installation instructions for your platform. If you are using Mac, the Hortonworks Sandbox 1.3 has a tutorial that shows exactly how to install the ODBC driver on Mac, which I recommend following due to the complexity of the install. The Sandbox helps with ODBC setup but you will need a full cluster to try the encryption feature because the Sandbox doesn’t support Kerberos. Once you have installed it, define an ODBC data source to your cluster.

sec14

sec15

Step 5: Configure Kerberos authentication for your client

Your client will need a Kerberos ticket to continue. Obtaining the ticket depends on your OS. If using Windows, Appendix A of the ODBC user guide guides you through the process. Other systems will usually have a kinit program pre-installed.

Step 6: Securely connect your favorite analytics tool to Hadoop.

At this point all that remains is to use the ODBC connection you’ve configured from your analytics tool of choice. Based on the HiveServer2 configuration, all communications will be done encrypted.

More Security Goodness

We’re also excited to announce that in addition to Kerberos authentication, HDP 2 will also support LDAP authentication in HiveServer2. Many customers who don’t want to go through the process of fully “Kerberizing” their clusters find this an easier alternative that still meets their authentication needs.

Summing Up

HDP 2 brings critical improvements in authentication and privacy that are essential to enabling broad-based consumption of Hadoop. We welcome you to try it out today in the HDP Beta or in the HDP Beta Sandbox and give us your feedback.

Download and explore HDP 2.0 Beta over here.

Categorized by :
Administrator Ambari CIO & ITDM Data Analyst & Scientist Developer HDP Hive Other Sandbox Security

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

Big Data Virtual Meetup Chennai
Wednesday, October 29, 2014
9:00 pm India Time / 8:30 am Pacific Time / 4:30 pm Europe Time (Paris)

More Webinars »

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.