Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
March 01, 2017
prev slideNext slide

It’s Morphing Time: Apache Ranger Graduates to a Top Level Project – Part 2

Apache Ranger’s graduation to TLP is just one step in a longer journey to help enterprises across industries secure their big data platforms using a modern opensource based, authorization and audit framework. Below are the highlights of the breadth of capabilities currently available in Apache Ranger:

  1. Apache Ranger is a centralized framework to define, administer and manage security policies consistently across Hadoop components. The community has built the most comprehensive security coverage across the Hadoop ecosystem components including HDFS, Apache Hive, Apache HBase, Apache YARN, Apache Kafka, Apache Solr, Apache Storm, Apache Knox, Apache NiFi, and Apache Atlas that are all supported natively by Apache Ranger today. Apache Ranger’s approach to authorization is based on attribute-based access control (ABAC), which is a combination of the subject, action, resource, and environment. Using descriptive attributes such as AD group, Apache Atlas-based tags or classifications, geo-location, etc., of the subjects, resources, and environment, Apache Ranger provides a modern and superior policy approach beyond simple role-based access control (RBAC). ABAC approach is also consistent with recommendations outlined by NIST for ABAC in NIST 800-162. This approach enables compliance personnel and security administrators to define precise and intuitive security policies at a very fine-grained level for each resource – at Hive database/ table/ column, HBase table/ column family/ column, Kafka topic, YARN queue, Solr index, HDFS file/folder etc., thereby utilizing additional user context data (IPs, AD groups etc.) to administer policies easily. By leveraging ABAC approach, Apache Ranger overcomes the pitfalls of traditional RBAC models that place a heavy burden on security administrators and lead to role proliferation and manageability issues.

2. Apache Ranger follows the industry best practices for setting up policies with the least privilege, under which users are explicitly denied access unless there is a policy in place that grants them specific access (for example, a user may only have Select but not Update privileges). Apache Ranger further enhances this best practice, by setting Deny conditions to supersede Allow conditions by default. The ability to support conditions for deny/ allow along with specific exclude/ include conditions means that security and compliance administrators can now achieve truly fine-grained access control by writing a small set of easily understandable policies! In some cases, what would have required a dozen roles and permissions to specify a policy, can now be done with a single simple policy in Apache Ranger’s robust policy framework!

3. Apache Ranger provides easy extension points for community and partners to add new systems for authorization even if they are outside of Hadoop ecosystem. The robust plugin architecture makes it simple to extend Ranger’s authorization model via lightweight plugins that work locally in the context of the resources being authorized. Ranger community has also made it easy to add custom dynamic policy conditions (such as prohibition of toxic joins), user context enrichers (such as geo-location and time of day mappings) to further enrich the security policy framework.

4. Apache Ranger also provides a Key Management Service (KMS) that is 100% compatible with Hadoop’s native KMS API to store and manage encryption keys for HDFS Transparent Data Encryption. Apache Ranger KMS enables interoperability with Hardware Security Modules and other trust anchors to enable customers to adopt enterprises.

5. Apache Ranger is the central audit location for all access requests across all services it authorizes. The comprehensive audits framework provides rich event data along with contextual metadata such as data classifications of the resources accessed, IP, locale, the specific policy, and the version that granted or denied access etc. for each access request. Additionally, the audit framework can be scaled as the cluster grows. Ranger audit framework also has a real-time visual query interface via Solr where security admins can perform forensic analysis of events and analyze security events to glean insights. This makes Apache Ranger approach very flexible: users can either leverage the raw event data to do additional post-processing and visualization that they need or use Ranger’s visual query interface directly. This offers customers flexibility in terms of retention and archival of event data and the ability to scale out the audits with the size of cluster and volume of events.

  1. Apache Ranger provides advanced security for Apache Hive via dynamic column masking and row filtering. Such capabilities obviate the need to manage thousands of individual data views for consumers throughout the organization and provides an agile way of presenting customized views of data when compared with inflexible mapping of data to user groups.

a. Apache Ranger provides dynamic data masking capabilities via simple intuitive masking policies for Hive columns so that only authorized users can see the data they are permitted to see, while for other users or groups the same data is masked or anonymized to protect sensitive content in a variety of flexible formats. Masking policies can be used to define which specific data fields are masked and what are the rules for how to anonymize or pseudonymize specific data.

b. Apache Ranger provides row-level security through row filtering policies that execute as a behind-the-scenes query filter condition and narrow the set of Hive table rows that are displayed in a Hive query output. In effect, these policy filtering conditions are always on and are evaluated upon access to automatically obviate the need for security administrators to add these filtering predicates manually or create multiple views.

7. Apache Ranger also provides the 100% open source and industry’s first classification or tag based policies for Hadoop ecosystem components via integration with Apache Atlas, an open metadata and data governance framework that is part of the Hadoop stack. This integration with Apache Atlas enables data stewards and administrators to separate resource-classification from access authorization, that is data access based on resource classification (e.g. sensitive data such as PII, PCI) rather than resource type itself. This approach provides automatic and dynamic enforcement of policies based on classification.  As the entity classification is updated the policy itself doesn’t have to be updated individually for a large number of users or scenarios. Resource classification also provide a single authorization policy for a metadata tag across various Hadoop components – a “write once, apply many times” model for policy authoring! For example, resources (HDFS file/ directory, Hive database/ table/column etc.) containing sensitive data can be tagged with PII/ PCI/ PHI labels when the data enters the Hadoop ecosystem or any time later. Once a resource is tagged, the authorization for the tag would automatically be enforced, thus eliminating the need to create or update policies for the resource. Also, a single authorization policy for a tag can be used to authorize access to resources across various Hadoop components – which eliminates the need to create separate policies in each component.

Data lakes require new patterns to access data that have deep implications for how data security can be applied and scaled independently. From an architectural perspective, an attribute-based model is ideally suited for such large data lake environments and makes Apache Ranger an attractive choice for securing data lakes from an access control and audit perspective. Apache Ranger also benefits from over 4 years of community-based, meritocratic development and maturing through numerous enterprise deployments. With this maturity, Apache Ranger is poised to address the seemingly daunting data challenges of Hadoop security without compromising the benefits of democratized data access. Once more, congratulations to the Apache Ranger community for achieving this historic milestone! We at Hortonworks look forward to continuing to work closely with the community and our customers to move the project forward.

Interested in learning more about Apache Ranger? Check out http://ranger.apache.org/ask questions on https://community.hortonworks.com or on dev@ranger.apache.org, or download latest stable version of Apache Ranger from https://github.com/apache/ranger/

Leave a Reply

Your email address will not be published. Required fields are marked *