Enabling Kerberos on HDP and Integrating with Active Directory

How to configure Kerberos on HDP and integrate with Active Directory

Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.

We assume you’re familiar with:

  • Active Directory Domain Controller with Server 2012
  • HDP 2.1 Cluster installed on CentOS or RedHat
  • Windows machine added to the DC/AD

For illustration, our cluster will consist of 2 nodes (node1.hadoop.com , node2.hadoop.com), and I will use AD.COM as my Active Directory realm and HADOOP.COM as my HDP  Cluster realm.

How to Kerberize an HDP Cluster

To enable Kerberos on HDP, install KDC plus all the required software. As part of the package, HDP comes with a prebuilt script that will setup all of our service principals and keytabs as well as redistribute keytabs across the cluster nodes. Before you run the packaged script, you must install and customize a few things.

First, choose your KDC server and run this command on any node in the cluster. In our case, we ran the script on the NameNode, but it could be run it on any node (internally or externally).
yum -y install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Second, install the Kerberos client and utilities on all other nodes in the cluster.
yum -y install krb5-libs krb5-auth-dialog krb5-workstation

Third, edit KDC configuration with your HDP and AD realm information:

  • HDP Cluster realm = HADOOP.COM
  • Windows Domain = AD.COM

The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms we outlined above, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. By default krb5.conf is located at /etc/krb5.conf:

 
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = HADOOP.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true

[realms]
 HADOOP.COM = {
  kdc = node1.hadoop.com
  admin_server = node1.hadoop.com
 }

AD.COM = {
 kdc = windc.ad.com
 admin_server = windc.ad.com
}

[domain_realm]
 .hadoop.com = HADOOP.COM
 hadoop.com = HADOOP.COM
 .ad.com = AD.COM
 ad.com = AD.COM

[capaths]
 AD.COM = {
  HADOOP.COM = .
 }

Use the above configuration as a template and replace HADOOP.COM and AD.COM with your own respective realm and Domain Controller (DC) on all cluster hosts.

Below, you can review additional information or each section in krb5.conf

realms

information on the realms we are working with. The kdc and admin_server value for HADOOP.COM is the host where we installed our KDC server, and for Active Directory, it is the host of the Domain Controller.

domain_realm

provides a translation from a domain name or hostname to a Kerberos realm name. The tag name can be a host name, or a domain name, where domain names are indicated by a prefix of a period (.). Host names and domain names should be in lower case.

capaths

for cross-realm authentication, a database is needed to construct the authentication paths between the realms. This section defines the store.

Finally, edit kdc.conf, which by default is located in /var/Kerberos/krb5kdc/kdc.conf. It contains KDC configuration information, including defaults used when issuing Kerberos tickets.

 
[realms]
  HADOOP.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

Once you have completed all the above steps, you can now Kerberize the cluster using an Ambari script packaged with HDP. To that end, you must generate all Hadoop service principals and their respective keytabs using Apache Ambari.

In the Ambari, navigate to the Admin tab —> Security and click on Enable Security:

ambari_1

When prompted with Ambari Security Wizard, click Next to navigate to the Configure Services page in which you can edit your cluster realm name, in our case, it is HADOOP.COM

ambari_2

Click next to Create Principals and Keytabs page, which at the bottom has an option to Download CSV containing all the required information of principals and keytabs you’ll need in order to run the Kerberos script.

ambari_3

After downloading the CSV file, copy the CSV onto the Ambari server, and execute the Ambari script /var/lib/ambari-server/resources/scripts/kerberos-setup.sh on the Ambari server.

You will also need to use the private key with permissions to all the nodes in the cluster in order to generate principals and create appropriate keytabs. Run the script with the command syntax below:

Usage: Kerberos-setup.sh HOST_PRINCIPAL_KEYTABLE.csv SSH_LOGIN_KEY_PATH

For Example:
./kerberos-setup.sh /host-principal-keytab-list.csv ~/.ssh/id_rsa

The script will generate all the required principals, keytabs, and accordingly, redistribute them around the cluster depending on where the appropriate Hadoop services are installed. When the script finishes, ensure that /etc/security/keytabs/ contains keytabs for the Hadoop services installed on that host.
keytabs

To finalize Kerberizing the cluster, go back to Ambari Security Wizard. Click next on the section where we Downloaded the Principals CSV and wait for Ambari to restart the cluster services, map principals to appropriate HDP services, and start everything in Kerberos mode.

Once completed, you should see a screen shown below with all services running.

ambari_4

Next step is to create an administrator. Run the command below and provide a strong password.

/usr/sbin/kadmin.local -q "addprinc root/admin"

Now edit /var/kerberos/krb5kdc/kadm5.acl with your correct realm

*/admin@HADOOP.COM *

Then, run the command to obtain a Kerberos ticker with admin user

kinit root/admin@HADOOP.COM

When you execute klist, you’ll see a ticket granted for admin.

krbc

Now enter admin console by running command below:
kadmin.local

Here we need to add the Kerberos key distribution center account called krbtgt. Run the command in kadmin.local console to add the principal and available encryption:

addprinc -e "aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal" krbtgt/HADOOP.COM@AD.COM

How to enable Domain and Trust in Active Directory

We managed to Kerberize our HDP cluster. Now let’s walk through enabling our domain and trust in Active Directory. This requires few steps.

First, we need to configure network security and enable network encryption for Kerberos in the active or default local domain policy. Secure connections from Windows through HiveServer that use BI tools and ODBC require this domain policy.

In the Active Directory console, go to:
Server Manager > Group Policy Management > Domain > Group Policy Objects > Default or Active Domain Policy and Edit
ad_1

Then Navigate to Computer Configuration > Policies > Windows Settings > Security Settings > Local Policies > Security Options and configure Network security: Configure Encryption types allowed for Kerberos.

ad_2

In this case I have chosen every encryption, but you can limit to what you want to enable.

Second, run these commands in PowerShell to configure Windows Kerberos with realms in Linux.

ksetup /addkdc HADOOP.COM node1.hadoop.com
netdom trust HADOOP.COM /Domain: AD.COM /add /realm /passwordt:HortonworkS4554
ksetup /SetEncTypeAttr HADOOP.COM DES-CBC-CRC DES-CBC-MD5 RC4-HMAC-MD5 AES128-CTS-HMAC-SHA1-96 AES256-CTS-HMAC-SHA1-96

The password used for netdom trust needs to be the same password as the password used for krbtgt/HADOOP.COM@AD.COM

Third, restart Kerberos services:
/sbin/service krb5kdc restart
/sbin/service kadmin restart

Fourth, go back to Ambari and map users from AD domain to have access in HDP Cluster by adding the line shown below into hadoop.security.auth_to_local text field in Ambari under HDFS service in Advanced section:

ambari_5

RULE:[1:$1@$0](.*@AD.COM)s/@.*//

Now restart the outdated components for the configuration changes to take effect.

ambarI_6

Finally, after all the above steps, we can authenticate users from the AD and get tickets. I have created a user called adis in AD that I will initialize by running the command below and assign the AD user password:

kinit adis@AD.COM

krb2

How To Enable Kerberos Security in Hue

So far we have Kerberized the Cluster and configured one-way domain trust between Active Directory AD and HDP Cluster HADOOP.COM. In the final step, we will configure one of the access patterns for querying data from the HDP cluster.

In this case, we will use Hue as one of these access patterns, but in the next blog, we will discuss about BI Tools like Tableau and other ODBC compliant applications and access pattern.

If you do not have Hue installed, follow the detailed Hortonworks documentation found here.

Once Hue is installed, configure it to support security with Kerberos by creating the Hue principal and its keytab.

In kadmin console run the command below to create hue principal (node1 is the host where we installed hue):

kadmin: addprinc -randkey hue/node1.hadoop.com@HADOOP.COM

Next, generate the keytab for hue principal:

kadmin: xst -k /etc/security/keytabs/hue.service.keytab hue/node1.hadoop.com@HADOOP.COM

Make sure hue.service.keytab file is accessible by the hue user; otherwise the service will not start properly. Initialize hue principal using the keytab file:

kinit -k -t /etc/security/keytabs/hue.service.keytab hue/node1.hadoop.com@HADOOP.COM

kinit

Then configure hue.ini for Kerberos Security. Below are the Kerberos specific settings that you must change (the hue.ini can be found in /etc/hue/conf/hue.ini.)

# Authentication backend. Common settings
backend=desktop.auth.backend.LdapBackend

# The search base for finding users and groups
base_dn="DC=ad,DC=com"

# The NT domain to connect to (only for use with Active Directory)
nt_domain=ad.com

# URL of the LDAP server
ldap_url=ldap://windc.ad.com:389

# Distinguished name of the user to bind as -- not necessary if the LDAP server
# supports anonymous searches
# hadoopservice account needs to exist in Active Directory so create one.
bind_dn="cn=hadoopservice,cn=Users,dc=ad,dc=com"

# Password of the bind user -- not necessary if the LDAP server supports
# anonymous searches
#this is the hadoopservice password
bind_password=HortonworkS4554

# Pattern for searching for usernames -- Use <username> for the parameter
#if you are adding your Hadoop users as part of an OU make sure to include it here during searches as ou=HadoopUsers
# For use when using LdapBackend for Hue authentication ldap_username_pattern="cn=<username>,cn=Users,dc=ad,dc=com"
# Ignore the case of usernames when searching for existing users in Hue.
ignore_username_case=true

# Force usernames to lowercase when creating new users from LDAP.
force_username_lowercase=true

# Path to Hue's Kerberos keytab file
hue_keytab=/etc/security/keytabs/hue.service.keytab

# Kerberos principal name for Hue
hue_principal=hue/node1.hadoop.com

# Path to kinit
kinit_path=/usr/bin/kinit

# Path to keep Kerberos credentials cached.
ccache_path=/tmp/hue_krb5_ccache

Go through the entire hue.ini file and change every occurrence of 

security_enabled=true

Go through the entire hue.ini file and change every occurrence of
security_enabled=true

If this is running on VMs you might experience hang times at the login, so make sure to edit the resolv.conf and add the Domain Controller IP. On a production cluster this should be done with NSCD and caching.

Bring up Hue and login with your AD username/password

Screen Shot 2014-07-11 at 3.07.09 PM

If you look at Hue access log, located at /var/log/hue/access.log, you will see the AD username being authenticated:

Screen Shot 2014-07-11 at 3.08.33 PM

Furthermore, you can check out Active Directories Event Viewer and notice the username AD\adis being authenticated there as well:

ad4

Conclusion

We went through three detailed component configuration steps:

  1. Kerberizing the HDP Cluster
  2. Enabling Domain and Trust in Active Directory
  3. Enabling Kerberos Security in Hue

With these three major components successfully configured, you will have a fully functional Keberized HDP cluster, integrated with Active Directory, and accessible via Hue.

What’s Next

In the next blog, we will look at using BI tools like Tableau, and how to authenticate with Kerberos through ODBC/JDBC.

To learn more about Kerberos and how to setup Kerberos with LDAP, read Leonid Fedotov’s in-depth Kerberos Implementation.

Categorized by :
Administrator HDP Operations & Management Security

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.