cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Tag based policies with Apache Ranger and Apache Atlas

Introduction

Hortonworks has recently announced the integration of Apache Atlas and Apache Ranger, and introduced the concept of tag or classification based policies. Enterprises can classify data in Apache Atlas and use the classification to build security policies in Apache Ranger.

This tutorial walks through an example of tagging data in Atlas and building a security policy in Ranger.

Prerequisites

Outline

1. Start Kafka, HBase, Ambari Infra, Ranger and Atlas

1.1 View the Services Page

Started by logging into Ambari as raj_ops user.

ambari_dashboard_rajops

From the Dashboard page of Ambari, click on Kafka from the list of installed services.

new_select_kafka

1.2 Start Kafka Service

From the Kafka page, click on Service Actions -> Start

start_kafka

Check the box and click on Confirm Start:

confirmation_kafka

Wait for Kafka to start (It may take a few minutes to turn green)

new_started_kafka

Now start other required services as well. Ranger Tagsync is stopped by default so please restart Ranger as well. Finally, start Atlas at the end. Your Ambari dashboard page should look like this:

new_ambari_dashboard

2. General Information

Ranger can be accessed using the following credentials:

User id – raj_ops
Password – raj_ops

And for Atlas:

User id – holger_gov
Password – holger_gov

In the tutorial steps below, we are going to be using user ids raj_ops and maria_dev. You can login into Ambari view using the following credentials

User id – raj_ops
Password – raj_ops

User id – maria_dev
Password – maria_dev

3. Sample Ranger Policy for Different User Personas

Navigate to Ranger UI by typing 127.0.0.1:6080 on browser, you will see a Ranger login page:

Use username – raj_ops and password – raj_ops

ranger_login_rajops

Press Sign In button, the home page of Ranger will be displayed. Click on Sandbox_hive.

click_sandbox_hive_rajops

You will now see a list of policies under Sandbox_hive repository. Click on the box of last policy which is policy for raj_ops, holger_gov, maria_dev and amy_ds

click_policy_for_all_users_rajops

This policy is meant for these 4 users and is applied to all tables and all columns of a foodmart database.

sample_foodmart_policy_rajops

To check the type of access that this policy imposed on these users, scroll down:

allow_condition_sample_policy_rajops

You can give any access to the users as per their roles in the organization.

4. Access Without Tag Based Policies

Let’s create a hive table employee from Ambari Hive View.

Go back to Ambari and then to Hive view from 9 square menu icon, and type the following create table query:

create table employee (ssn string, name string, location string)
row format delimited
fields terminated by ','
stored as textfile;

And click on green Execute button.

create_hive_table

You can check whether your table gets created or not by refreshing database explorer. Click on default database, you will see a new employee table.

list_hive_table

Now let’s put some records into this table.

First SSH into the Hortonworks Sandbox with the command:

ssh root@127.0.0.1 -p 2222

sshTerminal

Create a new file called employeedata.txt from the terminal:

vi employeedata.txt

And put following data:

111-111-111,James,San Jose
222-222-222,Mike,Santa Clara
333-333-333,Robert,Fremont

Exit the vi shell by typing Esc–>wq!
Now copy this file to HDFS where employee table data is being stored.

hadoop fs -copyFromLocal employeedata.txt /apps/hive/warehouse/employee

Now let’s go back to hive view to view this data.

Go to database explorer and press the button right next to employee table:

view_data

You will be able to see the data.

In the first scenario, you have an employee data table in Apache Hive with ssn, name and location as part of the columns. The ssn and location information is deemed sensitive and users should not have access to it.

You need to create a Ranger policy which allows for access to name column except ssn and location. This policy will be assigned to both raj_ops and maria_dev user for now.

Go to Ranger UI on:
127.0.0.1:6080

ranger_homepage_rajops

Go back to Sandbox_hive and then Add New Policy:

new_sandbox_hive_policies

Enter following values:

Policy Names - policy to restrict employee data
Hive Databases - default
table - employee
Hive_column - ssn, location (NOTE : Do NOT forget to exclude these columns)
Description - Any description

In the Allow Conditions, it should have the following values:

Select Group – blank, no input
Select User – raj_ops, maria_dev
Permissions – Click on the + sign next to Add Permissions and click on select and then green tick mark.

add_permission

You should have your policy configured like this:

employee_policy_rajops

Click on Add and you can see the list of policies that are present in Sandbox_hive.

employee_policy_added_rajops

You have to disable the Hive Global Tables Allow to test out the one that you just created. Go inside to this policy and toggle to disable it.

hive_global_policy_rajops

To check the access, get back to Ambari as maria_dev user.

maria_dev_ambari_login

Go directly to Hive View, click on default database and employee table. Press the button to view the table data.

maria_dev_access_error

You will get an authorization error. This is expected as the user does not have access to 2 columns in this table (ssn and location). To verify this, you can also view the Audits in Ranger. Go back to Ranger and click on Audits=>Access and select Sandbox_hive in Service Name. You will see the entry of Access Denied for maria_dev.

new_policy_audit

Now coming back to Hive View, try running a query for selective column.

SELECT name from employee LIMIT 100;

maria_dev_access_successful

The query runs successfully.
Even, raj_ops user cannot not see all the columns the location and SSN. We would provide access to this user to all columns later.

restrict_policy_rajops

5. Create Tag and Tag Based Policy

As a first step, login into ATLAS web app using http://127.0.0.1:21000/ and use username holger_gov and password holger_gov.

new_atlas_login

Go to Tags tab and press Create Tag button.
Give it a name as PII and description as Personal Identifiable Information.

new_create_tag

Now go to Search tab and write employee in the box. It will give all the entities related to word employee.
Click on employee

search_employee_rajops

You can view the details of an employee hive table.

view_employee_hive_table

Go to Schema tab to assign specific columns to PII tag.

employee_schema_rajops

Press blue + button to assign the tag to SSN. Then, Select PII from the list of tags and click Save.

add_tags_rajops

Repeat the same for the location row from the above list. Refresh the page, now you should see both ssn and location columns are marked with PII tag. What essentially it means that we have classified any data in the ssn and location columns as PII.

pii_tag_assigned_rajops

Now let’s go back to Ranger UI. The tag and entity relationship will be automatically inherited by Ranger. In Ranger, we can create a tag based policy by accessing it from the top menu. Go to Access Manager → Tag Based Policies.

select_tag_based_policies_rajops

You will see a page like given below.

new_tag_rajops

Click + button to create a new tag service.

add_sandbox_tag_rajops

Give it a name Sandbox_tag and click Add.

added_sandbox_tag_rajops

Click on Sandbox_tag to add a policy.

add_new_policy_rajops

Click on Add New Policy button.
Give following details:

Policy Name – PII column access policy
Tag – PII
Description – Any description
Audit logging – Yes

pii_column_access_policy_rajops

In the Allow Conditions, it should have the following values:

Select Group - blank
Select User - raj_ops
Component Permissions - Select Hive

You can select the component permission through this popup:

new_allow_permissions

Please verify that Allow Conditions section is looking like this:

allow_conditions_rajops

This signifies that only raj_ops is allowed to do any operation on the columns that are specified by PII tag. Click Add.

pii_policy_created_rajops

Now click on Resource Based Policies and edit Sandbox_hive repository by clicking on the button next to it.

editing_sandbox_hive

Click on Select Tag Service and select Sandbox_tag. Click on Save.

new_edited_sandbox_hive

The Ranger tag based policy is now enabled for raj_ops user. You can test it by running the query on all columns in employee table.

admin_access_successful

The query executes successfully. The query can be checked in the Ranger audit log which will show the access granted and associated policy which granted access. Select Service Name as Sandbox_hive in the search bar.

audit_results_rajops

NOTE: There are 2 policies which provided access to raj_ops user, one is a tag based policy and the other is hive resource based policy. The associated tags (PII) is also denoted in the tags column in the audit record).

Summary

Ranger traditionally provided group or user based authorization for resources such as table, column in Hive or a file in HDFS.
With the new Atlas -Ranger integration, administrators can conceptualize security policies based on data classification, and not necessarily in terms of tables or columns. Data stewards can easily classify data in Atlas and use in the classification in Ranger to create security policies.
This represents a paradigm shift in security and governance in Hadoop, benefiting customers with mature Hadoop deployments as well as customers looking to adopt Hadoop and big data infrastructure for first time.