Enabling Kerberos on HDP and Integrating with Active Directory

How to configure Kerberos on HDP and integrate with Active Directory

Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.

We assume you’re familiar with:

  • Active Directory Domain Controller with Server 2012
  • HDP 2.1 Cluster installed on CentOS or RedHat
  • Windows machine added to the DC/AD

For illustration, our cluster will consist of 2 nodes (node1.hadoop.com , node2.hadoop.com), and I will use AD.COM as my Active Directory realm and HADOOP.COM as my HDP  Cluster realm.

How to Kerberize an HDP Cluster

To enable Kerberos on HDP, install KDC plus all the required software. As part of the package, HDP comes with a prebuilt script that will setup all of our service principals and keytabs as well as redistribute keytabs across the cluster nodes. Before you run the packaged script, you must install and customize a few things.

First, choose your KDC server and run this command on any node in the cluster. In our case, we ran the script on the NameNode, but it could be run it on any node (internally or externally).
yum -y install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Second, install the Kerberos client and utilities on all other nodes in the cluster.
yum -y install krb5-libs krb5-auth-dialog krb5-workstation

Third, edit KDC configuration with your HDP and AD realm information:

  • HDP Cluster realm = HADOOP.COM
  • Windows Domain = AD.COM

The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms we outlined above, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. By default krb5.conf is located at /etc/krb5.conf:

 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

 default_realm = HADOOP.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true

  kdc = node1.hadoop.com
  admin_server = node1.hadoop.com

AD.COM = {
 kdc = windc.ad.com
 admin_server = windc.ad.com

 .hadoop.com = HADOOP.COM
 hadoop.com = HADOOP.COM
 .ad.com = AD.COM
 ad.com = AD.COM

 AD.COM = {

Use the above configuration as a template and replace HADOOP.COM and AD.COM with your own respective realm and Domain Controller (DC) on all cluster hosts.

Below, you can review additional information or each section in krb5.conf


information on the realms we are working with. The kdc and admin_server value for HADOOP.COM is the host where we installed our KDC server, and for Active Directory, it is the host of the Domain Controller.


provides a translation from a domain name or hostname to a Kerberos realm name. The tag name can be a host name, or a domain name, where domain names are indicated by a prefix of a period (.). Host names and domain names should be in lower case.


for cross-realm authentication, a database is needed to construct the authentication paths between the realms. This section defines the store.

Finally, edit kdc.conf, which by default is located in /var/Kerberos/krb5kdc/kdc.conf. It contains KDC configuration information, including defaults used when issuing Kerberos tickets.

  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

Once you have completed all the above steps, you can now Kerberize the cluster using an Ambari script packaged with HDP. To that end, you must generate all Hadoop service principals and their respective keytabs using Apache Ambari.

In the Ambari, navigate to the Admin tab —> Security and click on Enable Security:


When prompted with Ambari Security Wizard, click Next to navigate to the Configure Services page in which you can edit your cluster realm name, in our case, it is HADOOP.COM


Click next to Create Principals and Keytabs page, which at the bottom has an option to Download CSV containing all the required information of principals and keytabs you’ll need in order to run the Kerberos script.


After downloading the CSV file, copy the CSV onto the Ambari server, and execute the Ambari script /var/lib/ambari-server/resources/scripts/kerberos-setup.sh on the Ambari server.

You will also need to use the private key with permissions to all the nodes in the cluster in order to generate principals and create appropriate keytabs. Run the script with the command syntax below:


For Example:
./kerberos-setup.sh /host-principal-keytab-list.csv ~/.ssh/id_rsa

The script will generate all the required principals, keytabs, and accordingly, redistribute them around the cluster depending on where the appropriate Hadoop services are installed. When the script finishes, ensure that /etc/security/keytabs/ contains keytabs for the Hadoop services installed on that host.

To finalize Kerberizing the cluster, go back to Ambari Security Wizard. Click next on the section where we Downloaded the Principals CSV and wait for Ambari to restart the cluster services, map principals to appropriate HDP services, and start everything in Kerberos mode.

Once completed, you should see a screen shown below with all services running.


Next step is to create an administrator. Run the command below and provide a strong password.

/usr/sbin/kadmin.local -q "addprinc root/admin"

Now edit /var/kerberos/krb5kdc/kadm5.acl with your correct realm

*/admin@HADOOP.COM *

Then, run the command to obtain a Kerberos ticker with admin user

kinit root/admin@HADOOP.COM

When you execute klist, you’ll see a ticket granted for admin.


Now enter admin console by running command below:

Here we need to add the Kerberos key distribution center account called krbtgt. Run the command in kadmin.local console to add the principal and available encryption:

addprinc -e "aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal" krbtgt/HADOOP.COM@AD.COM

How to enable Domain and Trust in Active Directory

We managed to Kerberize our HDP cluster. Now let’s walk through enabling our domain and trust in Active Directory. This requires few steps.

First, we need to configure network security and enable network encryption for Kerberos in the active or default local domain policy. Secure connections from Windows through HiveServer that use BI tools and ODBC require this domain policy.

In the Active Directory console, go to:
Server Manager > Group Policy Management > Domain > Group Policy Objects > Default or Active Domain Policy and Edit

Then Navigate to Computer Configuration > Policies > Windows Settings > Security Settings > Local Policies > Security Options and configure Network security: Configure Encryption types allowed for Kerberos.


In this case I have chosen every encryption, but you can limit to what you want to enable.

Second, run these commands in PowerShell to configure Windows Kerberos with realms in Linux.

ksetup /addkdc HADOOP.COM node1.hadoop.com
netdom trust HADOOP.COM /Domain: AD.COM /add /realm /passwordt:HortonworkS4554

The password used for netdom trust needs to be the same password as the password used for krbtgt/HADOOP.COM@AD.COM

Third, restart Kerberos services:
/sbin/service krb5kdc restart
/sbin/service kadmin restart

Fourth, go back to Ambari and map users from AD domain to have access in HDP Cluster by adding the line shown below into hadoop.security.auth_to_local text field in Ambari under HDFS service in Advanced section:



Now restart the outdated components for the configuration changes to take effect.


Finally, after all the above steps, we can authenticate users from the AD and get tickets. I have created a user called adis in AD that I will initialize by running the command below and assign the AD user password:

kinit adis@AD.COM


How To Enable Kerberos Security in Hue

So far we have Kerberized the Cluster and configured one-way domain trust between Active Directory AD and HDP Cluster HADOOP.COM. In the final step, we will configure one of the access patterns for querying data from the HDP cluster.

In this case, we will use Hue as one of these access patterns, but in the next blog, we will discuss about BI Tools like Tableau and other ODBC compliant applications and access pattern.

If you do not have Hue installed, follow the detailed Hortonworks documentation found here.

Once Hue is installed, configure it to support security with Kerberos by creating the Hue principal and its keytab.

In kadmin console run the command below to create hue principal (node1 is the host where we installed hue):

kadmin: addprinc -randkey hue/node1.hadoop.com@HADOOP.COM

Next, generate the keytab for hue principal:

kadmin: xst -k /etc/security/keytabs/hue.service.keytab hue/node1.hadoop.com@HADOOP.COM

Make sure hue.service.keytab file is accessible by the hue user; otherwise the service will not start properly. Initialize hue principal using the keytab file:

kinit -k -t /etc/security/keytabs/hue.service.keytab hue/node1.hadoop.com@HADOOP.COM


Then configure hue.ini for Kerberos Security. Below are the Kerberos specific settings that you must change (the hue.ini can be found in /etc/hue/conf/hue.ini.)

# Authentication backend. Common settings

# The search base for finding users and groups

# The NT domain to connect to (only for use with Active Directory)

# URL of the LDAP server

# Distinguished name of the user to bind as -- not necessary if the LDAP server
# supports anonymous searches
# hadoopservice account needs to exist in Active Directory so create one.

# Password of the bind user -- not necessary if the LDAP server supports
# anonymous searches
#this is the hadoopservice password

# Pattern for searching for usernames -- Use <username> for the parameter
#if you are adding your Hadoop users as part of an OU make sure to include it here during searches as ou=HadoopUsers
# For use when using LdapBackend for Hue authentication ldap_username_pattern="cn=<username>,cn=Users,dc=ad,dc=com"
# Ignore the case of usernames when searching for existing users in Hue.

# Force usernames to lowercase when creating new users from LDAP.

# Path to Hue's Kerberos keytab file

# Kerberos principal name for Hue

# Path to kinit

# Path to keep Kerberos credentials cached.

Go through the entire hue.ini file and change every occurrence of 


Go through the entire hue.ini file and change every occurrence of

If this is running on VMs you might experience hang times at the login, so make sure to edit the resolv.conf and add the Domain Controller IP. On a production cluster this should be done with NSCD and caching.

Bring up Hue and login with your AD username/password

Screen Shot 2014-07-11 at 3.07.09 PM

If you look at Hue access log, located at /var/log/hue/access.log, you will see the AD username being authenticated:

Screen Shot 2014-07-11 at 3.08.33 PM

Furthermore, you can check out Active Directories Event Viewer and notice the username AD\adis being authenticated there as well:



We went through three detailed component configuration steps:

  1. Kerberizing the HDP Cluster
  2. Enabling Domain and Trust in Active Directory
  3. Enabling Kerberos Security in Hue

With these three major components successfully configured, you will have a fully functional Keberized HDP cluster, integrated with Active Directory, and accessible via Hue.

What’s Next

In the next blog, we will look at using BI tools like Tableau, and how to authenticate with Kerberos through ODBC/JDBC.

To learn more about Kerberos and how to setup Kerberos with LDAP, read Leonid Fedotov’s in-depth Kerberos Implementation.

Categorized by :
HDP Operations & Management Security


November 2, 2014 at 4:53 pm

Good write-up, thanks!

One thing that’s worth noting is the if you’ve pre-configured your KDC via automation as I do, then that kerberos_setup.sh script that’s supplied with Ambari actually overwrites /etc/krb5.conf with a version from /var/lib/ambari-server/resources/scripts/krb5.conf, resetting the realm back to EXAMPLE.COM and causing failures to generate principals and export keytabs. The workaround is to replace /var/lib/ambari-server/resources/scripts/krb5.conf with your pre-configured /etc/krb5.conf (or you could edit the script to not overwrite the config).

I’ve raised the issue as AMBARI-8112.



    February 11, 2015 at 10:01 am

    Yes. Saw that issue appear in the never version of Ambari. Thanks Hari.

Ram Baskaran
November 19, 2014 at 3:23 pm

Instead we can run /var/lib/ambari-server/resources/scripts/keytabs.sh host-principal-keytab-list.csv > keytabs-generate.sh this is going to generate tar files containing keytabs for every node on the cluster, we need place these to the appropriate node.

For me running kerberos-setup.sh changed the sudo privileges and I was not able to do sudo.

February 11, 2015 at 9:50 am

from my experience, you may also need to initiate the kdb manually
kdb5_util create -s -r myrealm

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">

White Paper
Big data doesn’t have to incur big risks. Put your data to work without sacrificing peace of mind.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.