Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
January 06, 2016
prev slideNext slide

Best practices in HDFS authorization with Apache Ranger

HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked into the HDFS layer. HDFS is protected using Kerberos authentication, and authorization using POSIX style permissions/HDFS ACLs or using Apache Ranger.

Apache Ranger ( is a centralized security administration solution  for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components.

How Ranger policies work for HDFS?

In order to ensure security in HDP environments, we recommend all of our customers to implement Kerberos, Apache Knox and Apache Ranger.

Apache Ranger offers a federated authorization model for HDFS. Ranger plugin for HDFS checks for Ranger policies and if a policy exists, access is granted to user. If a policy doesn’t exist in Ranger, then Ranger would default to native permissions model in HDFS (POSIX or HDFS ACL). This federated model is applicable for HDFS and Yarn service in Ranger.

Screen Shot 2016-01-05 at 8.44.38 AM

For other services such as Hive or HBase, Ranger operates  as the sole authorizer which means only Ranger policies are in effect. The option for fallback model is configured using a property in Ambari → Ranger → HDFS config → Advanced ranger-hdfs-security

Screen Shot 2016-01-05 at 8.45.39 AM

The federated authorization model enables customers to safely implement Ranger in an existing cluster without affecting  jobs which rely on POSIX permissions. We recommend to enable  this option as the default model for all deployments.

Ranger’s user interface makes it easy for administrators to find the permission (Ranger policy or native HDFS) that provides access to the user. Users can simply navigate to Ranger→ Audit and look for the values in the enforcer column of the audit data. If the populated value in Access Enforcer column is “Ranger-acl”, it indicates that a Ranger policy provided access to the user. If the Access Enforcer value is “Hadoop-acl”, then the access was provided by native HDFS ACL or POSIX permission.

Screen Shot 2016-01-05 at 8.45.49 AM

Best practices for HDFS authorization

Having a federated authorization model may create a challenge for security administrators looking to plan a security model for HDFS.

After Apache Ranger and Hadoop have been installed, we recommend administrators to implement the following steps:

  • Change HDFS umask to 077
  • Identify directory which can be managed by Ranger policies
  • Identify directories which need to be managed by HDFS native permissions
  • Enable Ranger policy to audit all records

Here are the steps again in detail.

  1. Change HDFS umask to 077 from 022. This will prevent any new files or folders to be accessed by anyone other than the owner

Administrators can change this property via Ambari:

Screen Shot 2016-01-05 at 8.45.58 AM

The umask default value in HDFS is configured to 022, which grants all the users  read permissions to all HDFS folders and files. You can check by running the following command in recently installed Hadoop

$ hdfs dfs -ls /apps

Found 3 items

drwxrwxrwx   – falcon hdfs       0 2015-11-30 08:02 /apps/falcon

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 07:56 /apps/hbase

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 08:01 /apps/hive

  1. Identify the directories that can be managed by Ranger policies

We recommend that permission for application data folders (/apps/hive, /apps/Hbase) as well as any custom data folders be managed through Apache Ranger. The HDFS native permissions for these directories need to be restrictive. This can be done through changing permissions in HDFS using chmod.


$ hdfs dfs -chmod -R 000 /apps/hive

$ hdfs dfs -chown -R hdfs:hdfs /apps/hive

$ hdfs dfs -ls /apps/hive

Found 1 items

d———   – hdfs hdfs          0 2015-11-30 08:01 /apps/hive/warehouse

Then navigate  to Ranger admin and give explicit permission to users as needed. For example:

Screen Shot 2016-01-05 at 8.46.04 AM

Administrators should follow the same process  for other data folders as well. You can validate  whether your changes are in effect by doing the following:

  • Connect to HiveServer2 using beeline
  • Create a table
    • create table employee( id int, name String, ssn String);
  • Go to ranger, and check the HDFS access audit. The enforcer should be ‘ranger-acl’Screen Shot 2016-01-05 at 8.46.16 AM
  1. Identify directories which can be managed by HDFS permissions

It is recommended  to let HDFS manage the permissions for /tmp and /user folders. These are used by applications and jobs which create user level directories.

Here, you should also set the initial permission for /user folder  to “700”, similar to the example below


hdfs dfs -ls /user

Found 4 items

drwxrwx—   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwxr-xr-x   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwxr-xr-x   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwxrwxr-x   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie


$ hdfs dfs -chmod -R 700 /user/*

$ hdfs dfs -ls /user

Found 4 items

drwx——   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwx——   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwx——   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwx——   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie

  1. Ensure auditing for all HDFS data.

Auditing in Apache Ranger can be controlled as a policy. When Apache Ranger is installed through Ambari, a default policy is created for all files and directories in HDFS and with auditing option enabled.This policy is also used by Ambari smoke test user “ambari-qa” to verify HDFS service through Ambari. If administrators disable this default policy, they would need to create a similar policy for enabling audit across all files and folders.

Screen Shot 2016-01-05 at 8.46.21 AM


Securing HDFS files through permissions is a starting point for securing Hadoop. Ranger provides a centralized interface for managing security policies for HDFS. Security administrators are recommended to use a combination of HDFS native permissions and Ranger policies to provide comprehensive coverage for all potential use cases. Using the best practices outlined in this blog, administrators can simplify the access control policies for administrative and user directories, files in HDFS.



Jeff Wright says:

This article starts off saying “change umask to 077”, then later says “change umask to 700”. Should all of those be 700, to restrict access to owner only?

Vishal Prakash Shah says:

This article mentions to change umask to 077. And change permission for “/tmp” and “/user” to 700. Both are different.
This link explains the purpose of umask.

Ceoni says:

All the same!

perm 777
umask 077 (-)
result 700 (for directories)

kundan says:

What is a recommended way to set-up policies when trying to control access to Storm over a secure channel using Apache ranger?

kundan says:

Is it possible to control access (submit topology and kill topology) of users other than Storm user in Apache ranger plugin for Apache Storm?

San Domingo says:

In ambari, “xasecure.add-hadoop-authorization” is located in: HDFS -> Advanced -> Advanced ranger-hdfs-security. The first tab is “HDFS” rather “Ranger”

David Hamilton says:

Security administrators are recommended to use a combination of HDFS native permissions and Ranger policies to provide comprehensive coverage for all potential use cases.

Can you please elaborate on why we would choose HDFS native instead of Ranger permissions? Are there any tools which can not leverage Ranger permissions?

peter rowan says:

How would you manage this in development for example where you want not just the owner but a group of users e.g developers to share access to a table but noone else. This is very common e.g a role group who would want access to tables who are part of a security structure whose role says they can access their data

kahdeer says:

I have installed ranger and ranger kms and setup all the configurations and everything is working fine.

I have created encryption zone in hdfs and in the policy i have mentioned two users(user 1 and user 2) to access this encryption zone, they are able to access this encryption zone . I want to set permissions to encryption zone in such a way that user1 should have read and write access and user 2 should have only read access?how can we define this ?

David Moore says:

Umask sets the default permissions opposite of the umask value. So by setting umask to 077, you are setting default permissions to any new files or directories to 700, which will “allow read, write, and execute permission for the file’s owner, but prohibit read, write, and execute permission for everyone else” (

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums