Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
HDP > Hadoop Administration > Security

Tag Based Policies with Apache Ranger and Apache Atlas

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

Hortonworks has recently announced the integration of Apache Atlas and Apache Ranger, and introduced the concept of tag or classification based policies. Enterprises can classify data in Apache Atlas and use the classification to build security policies in Apache Ranger.

This tutorial walks through an example of tagging data in Atlas and building a security policy in Ranger.

Prerequisites

Outline

1. Start Kafka, HBase, Ambari Infra, Ranger and Atlas

1.1 View the Services Page

Start by logging into Ambari as raj_ops user.

ambari_dashboard_rajops

1.2 Enable Ranger Audit to Solr

Click on the Ranger Service from the Ambari Stack of services on the left side column.

Select the Configs tab.

Select the Ranger Audit tab. Turn ON Ranger’s Audit to Solr feature by clicking on the OFF configuration under Audit to Solr.

Save the configuration. In the Save Configuration window that appears, write Turn ON Audit to Solr Feature, then click Save in that window. click OK button on Dependent Configurations window. click Proceed Anyway.

enable_audit_to_solr

Now restart the Ranger and all other services affected for the configuration to take effect. Click on Actions, Restart All Required, Confirm Restart All. We need these configurations updated for the services, so we can use Tag Based Policies with Atlas and Ranger.

1.3 Start Kafka Service

From the Kafka page, click on Service Actions -> Start

start_kafka

Check the box and click on Confirm Start:

confirmation_kafka

Wait for Kafka to start (It may take a few minutes to turn green)

new_started_kafka

Now start other required services as well. Ranger Tagsync is stopped by default so please restart Ranger as well. Finally, start Atlas at the end. Your Ambari dashboard page should look like this:

new_ambari_dashboard

Let’s start HBase, Ambari Infra and Atlas.

1.4 Restart Hive Service

From within the Hive service, click on Service Actions -> Restart

Since we will be monitoring user activity in Hive using Ranger, we need to restart Hive since it was an affected component during the Ranger configuration change.

2. General Information

Ranger can be accessed using the following credentials:

User id – raj_ops
Password – raj_ops

And for Atlas:

User id – holger_gov
Password – holger_gov

In the tutorial steps below, we are going to be using user ids raj_ops and maria_dev. You can login into Ambari view using the following credentials

User id – raj_ops
Password – raj_ops

User id – maria_dev
Password – maria_dev

3. Sample Ranger Policy for Different User Personas

Navigate to Ranger UI by typing 127.0.0.1:6080 on browser, you will see a Ranger login page:

Use username – raj_ops and password – raj_ops

ranger_login_rajops

Press Sign In button, the home page of Ranger will be displayed. Click on Sandbox_hive.

click_sandbox_hive_rajops

You will now see a list of policies under Sandbox_hive repository. Click on the box of last policy which is policy for raj_ops, holger_gov, maria_dev and amy_ds

click_policy_for_all_users_rajops

This policy is meant for these 4 users and is applied to all tables and all columns of a foodmart database.

sample_foodmart_policy_rajops

To check the type of access that this policy imposed on these users, scroll down:

allow_condition_sample_policy_rajops

You can give any access to the users as per their roles in the organization.

4. Access Without Tag Based Policies

Let’s create a hive table employee from Ambari Hive View 2.0.

1. Go back to Ambari and then to Hive View 2.0 from 9 square menu icon, and type the following create table query:

create table employee (ssn string, name string, location string)
row format delimited
fields terminated by ','
stored as textfile;

2. And click on green Execute button.

create_hive_table

You can check whether your table gets created or not by going to the TABLES tab, then refreshing the list of tables in the database. Select default database, you will see a new employee table.

list_hive_table

Now let’s put some records into this table.

3. First SSH into the Hortonworks Sandbox with the command:

ssh root@127.0.0.1 -p 2222

sshTerminal

4. Create the employeedata.txt file with the following data using the command:

printf "111-111-111,James,San Josen222-222-222,Mike,Santa Claran333-333-333,Robert,Fremont" > employeedata.txt

5. Copy the employeedata.txt file from your centOS file system to HDFS running on HDP with Hadoop hdfs command:

hdfs dfs -copyFromLocal employeedata.txt /apps/hive/warehouse/employee

6. Now let’s go back to hive view 2.0 to view this data.

Go to TABLES tab, note down the employee table name, then head back to QUERY tab and write the hive script:

select * from employee;

You will be able to see the data.

employee_data

In the first scenario, you have an employee data table in Apache Hive with ssn, name and location as part of the columns. The ssn and location information is deemed sensitive and users should not have access to it.

You need to create a Ranger policy which allows for access to name column except ssn and location. This policy will be assigned to both raj_ops and maria_dev user for now.

7. Go to Ranger UI on:
127.0.0.1:6080

ranger_homepage_rajops

8. Go back to Sandbox_hive and then Add New Policy:

new_sandbox_hive_policies

9. Enter following values:

Policy Names - policy to restrict employee data
Hive Databases - default
table - employee
Hive_column - ssn, location (NOTE : Do NOT forget to exclude these columns)
Description - Any description

10. In the Allow Conditions, it should have the following values:

Select Group – blank, no input
Select User – raj_ops, maria_dev
Permissions – Click on the + sign next to Add Permissions and click on select and then green tick mark.

add_permission

You should have your policy configured like this:

employee_policy_rajops

11. Click on Add and you can see the list of policies that are present in Sandbox_hive.

employee_policy_added_rajops

12. You have to disable the Hive Global Tables Allow to test out the one that you just created. Go inside to this policy and toggle to disable it. Disable the Policy Name.

hive_global_policy_rajops

13. To check the access, get back to Ambari as maria_dev user.

maria_dev_ambari_login

14. Go directly to Hive View 2.0, then QUERY tab, write the hive script to load the data from employee table.

select * from employee;

15. Click on the NOTIFICATIONS tab:

maria_dev_access_error

You will get an authorization error. This is expected as the user does not have access to 2 columns in this table (ssn and location).

16. To verify this, you can also view the Audits in Ranger. Go back to Ranger and click on Audits=>Access and select Sandbox_hive in Service Name. You will see the entry of Access Denied for maria_dev.

new_policy_audit

17. Now coming back to Hive View 2.0, try running a query for selective column.

SELECT name FROM employee;

maria_dev_access_successful

The query runs successfully.
Even, raj_ops user cannot not see all the columns for the location and SSN. We would provide access to this user to all columns later.

restrict_policy_rajops

5. Create Tag and Tag Based Policy

As a first step, login into ATLAS web app using http://127.0.0.1:21000/ and use username holger_gov and password holger_gov.

new_atlas_login

Go to Tags tab and press Create Tag button.
Give it a name as PII and description as Personal Identifiable Information.

new_create_tag

Now go to Search tab and write employee in the box. It will give all the entities related to word employee.
Click on employee

search_employee_rajops

You can view the details of an employee hive table.

view_employee_hive_table

Go to Schema tab to assign specific columns to PII tag.

employee_schema_rajops

Press blue + button to assign the tag to SSN. Then, Select PII from the list of tags and click Save.

add_tags_rajops

Repeat the same for the location row from the above list. Refresh the page, now you should see both ssn and location columns are marked with PII tag. What essentially it means that we have classified any data in the ssn and location columns as PII.

pii_tag_assigned_rajops

Now let’s go back to Ranger UI. The tag and entity relationship will be automatically inherited by Ranger. In Ranger, we can create a tag based policy by accessing it from the top menu. Go to Access Manager → Tag Based Policies.

select_tag_based_policies_rajops

You will see a page like given below.

new_tag_rajops

Click + button to create a new tag service.

add_sandbox_tag_rajops

Give it a name Sandbox_tag and click Add.

added_sandbox_tag_rajops

Click on Sandbox_tag to add a policy.

add_new_policy_rajops

Click on Add New Policy button.
Give following details:

Policy Name – PII column access policy
Tag – PII
Description – Any description
Audit logging – Yes

pii_column_access_policy_rajops

In the Allow Conditions, it should have the following values:

Select Group - blank
Select User - raj_ops
Component Permissions - Select Hive

You can select the component permission through this popup:

new_allow_permissions

Please verify that Allow Conditions section is looking like this:

allow_conditions_rajops

This signifies that only raj_ops is allowed to do any operation on the columns that are specified by PII tag. Click Add.

pii_policy_created_rajops

Now click on Resource Based Policies and edit Sandbox_hive repository by clicking on the button next to it.

editing_sandbox_hive

Click on Select Tag Service and select Sandbox_tag. Click on Save.

new_edited_sandbox_hive

The Ranger tag based policy is now enabled for raj_ops user. You can test it by running the query on all columns in employee table.

admin_access_successful

The query executes successfully. The query can be checked in the Ranger audit log which will show the access granted and associated policy which granted access. Select Service Name as Sandbox_hive in the search bar.

audit_results_rajops

NOTE: There are 2 policies which provided access to raj_ops user, one is a tag based policy and the other is hive resource based policy. The associated tags (PII) is also denoted in the tags column in the audit record).

Summary

Ranger traditionally provided group or user based authorization for resources such as table, column in Hive or a file in HDFS.
With the new Atlas -Ranger integration, administrators can conceptualize security policies based on data classification, and not necessarily in terms of tables or columns. Data stewards can easily classify data in Atlas and use in the classification in Ranger to create security policies.
This represents a paradigm shift in security and governance in Hadoop, benefiting customers with mature Hadoop deployments as well as customers looking to adopt Hadoop and big data infrastructure for first time.