Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
August 12, 2014
prev slideNext slide

Secure JDBC and ODBC Clients’ Access to HiveServer2

Introduction

HDP 2.1 ships with Apache Knox 0.4.0. This release of Apache Knox supports WebHDFS, WebHCAT, Oozie, Hive, and HBase REST APIs.

Hive is a popular component used for SQL access to Hadoop, and the Hive Server 2 with Thrift supports JDBC access over HTTP. The following steps show the configuration to enable a JDBC client to talk to Hive Server 2 via Knox (Beeline > JDBC over HTTPS > Knox > HTTP > Hive Server2). The picture describes the scenario that the blog covers.
jdbc_knox

This blog focuses on Beeline as the JDBC client; however, a screenshot of Simba ODBC Client configuration for ODBC access is attached at the bottom of the blog.

Step 1

Using Ambari, navigate to Hive > Config. In custom hive-site section set the the following:

  • hive.server2.thrift.http.path to cliservice
  • hive.server2.thrift.http.port
  • hive.server2.transport.mode to http

Note: If you get errors upon Hive Server 2 restart with port 10001, please use 10000 as the value for hive.server2.thrift.http.port and use 10000 in your Knox topology (sandbox.xml file is the topology in this blog )

Here is an example:

knox_1

Step 2

Save these Hive settings and restart HiveServer with Ambari.

Step 3

Ensure that the values you specified match the value for HIVE service. For example, here is my HIVE service values in /etc/knox/conf/topology/sandbox.xml that works against the values I specified in steps 1-3.


<service>
<role>HIVE</role>
<url>http://sandbox:10001/cliservice</url>
</service>

Step 4

Run beeline client.

/usr/bin/beeline

Use the connection string specified in step 5. Note the BOLD values need to be customized for your environment. The values below work in the HDP 2.1 Sandbox and match what I specified in steps 1-3

Step 5

Connect to the Hive Server 2

beeline> !connect jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive

Enter username and password that Beeline will send to Knox over HTTPS to authenticate the user. The Knox included with HDP 2.1 Sandbox has the account with the username “guest” and the password “guest-password.” Knox also supports using LDAP/AD for authentication, and once you configure Knox against LDAP, you can use any LDAP user to authenticate instead of guest account used in this example.

Note: The value for trustStorePassword is the Knox’s master password when Knox was setup. The default value of this password is “knox” only in HDP Sandbox.

Enter username for
jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive: guest

Enter password for
jdbc:hive2://sandbox:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive: **************

Connected to: Apache Hive (version 0.13.0.2.1.1.0-237)

In the Beeline connection string, a trust store for HTTPS connection to Knox is specified. This truststore (and its password) is needed only when Knox is not configured to use a well-known SSL certificate. For example, out-of-box, Knox Gateway uses a Self-Signed certificate for SSL, and that certificate needs to be exported and put into a file that the client can use.

However, in a production environment, Knox should be configured to use a CA authorized SSL certificate, and on the JDBC client, you need not configure a truststore and truststore password.

Step 5 with HDP 2.2

With HDP 2.2 the beeline connection URL can be simplified to
beeline> !connect jdbc:hive2://red1:8443/;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox

Note the sslTrustStore and it’s password are needed only when Knox is configured to use Self-Signed SSL Certificate.

Note

When SSL is used between Beeline & Knox and SSL certificate is either self-signed or less well known one needs to import the certificate into trust store of the JVM used by Knox.

Without this step you will get an SSL error like

“Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:397)
at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:126)
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:437) ”

To avoid the error import the SSL certificate as shown below.

Simplifying Beeline Connection String:

1. Export the Knox Gateway’s certificate to a file
2. Copy the file to the host running Beeline
3. Import the certificate from the file into the JDK’s CACert used to run beeline

Here is an example code, note change the paths for your setup.
sudo /usr/jdk64/jdk1.7.0_67/bin/keytool -exportcert -alias gateway-identity -keystore /var/lib/knox/data/security/keystores/gateway.jks -file knoxred1.crt

Copy knoxred1.crt to a host running Beeline

On the host running Beeline, Import Knox’s cert into CACerts

sudo /usr/jdk64/jdk1.7.0_67/bin/keytool -import -trustcacerts -file knoxred1.crt -alias gateway-identity -keystore /usr/jdk64/jdk1.7.0_67/jre/lib/security/cacerts

After doing this the beeline connection string can be simplified as

beeline> !connect jdbc:hive2://red1:8443/;transportMode=http;httpPath=gateway/default/hive;ssl=true;

Step 6

Then issue any SQL query, and the request will follow the path from Beeline > JDBC over HTTPS to Knox > over HTTP to Hive Server 2

For example, the show tables query results in the following output in HDP 2.1 Sandbox.

knox_2

Hive ODBC Client Configuration

The following screenshot illustrates the ODBC client side configuration needed to get  ODBC > HTTP > Knox > HTTP > Hive Server 2 setup working. You can download the ODBC driver from https://hortonworks.com/hdp/addons/

knox3

Optional : Connection to Knox without SSL

Apache Knox comes with SSL enabled out of box and the instructions so far in the blog provide instructions for connection from a JDBC client to Knox over SSL enabled channel.

To make JDBC/ODBC connection to Knox with out SSL you first need to disable SSL at  Knox  and then change the JDBC URL to not be SSL enabled. Note, with out  SSL connection to Knox, passwords sent over the wire can be sniffed.

Stop Knox
su -l knox -c /usr/lib/knox/bin/gateway.sh stop

Edit Knox Gateway config at /etc/knox/conf/gateway-site.xml (or another location if you install Knox at some non-default location)

Put “<property><name>ssl.enabled</name><value>false></value></property>”  in the gateway-site.xml and re-start Apache Knox

su -l knox -c /usr/lib/knox/bin/gateway.sh start

Run Beeline


beeline> !connect jdbc:hive2://sandbox:8443/;hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/sandbox/hive
beeline> show tables;

Conclusion

Apache Knox provides the ability to secure Hadoop’s REST API centrally. With Apache Knox, the REST/HTTP access to Hadoop benefits from centralized authentication, authorization, audit, identity management integration and SSO. In this blog we covered the configuration and steps needed for JDBC & ODBC clients to connect via Knox to Hive Server2.

Knox supports Hadoop cluster access both with & with out Kerberos enabled. Refer to Knox Admin guide for steps to configure Knox access to a Kerberos enabled Hadoop cluster.

If you have any comments or feedback, please email me at vshukla@hortonworks.com or post your questions on our Community Forums under Security.

Tags:

Comments

  • If i use beeline in non-interactive mode using below command –
    beeline -u ‘jdbc:hive2://xxxx.com:10001/default;transportMode=http;httpPath=cliservice’ -n …. -p …. -f test.hql
    my credentials will not be secure as anyone can run ps -ef | grep beeline and see my password.
    How can I use beeline securely in non-interactive mode?

  • Add the below in your .bashrc and then source the file. This will help to create the alias in the Edge Node and then simple alias can help.

    Adding the alias
    ——————————
    $ vim ~/.bashrc

    alias hbcli=’beeline -u “jdbc:hive2://ip-hive-server-ip separated with hifen>:10000/default;principal=driver-details” -n $USER’

    $ source .bashrc

    $ hbcli

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>