As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection.
On June 10th, the community announced the release of Apache Ranger 0.5.0. With this release, the community took major steps to extend security coverage for Hadoop platform and deepen its existing security capabilities. Apache Ranger 0.5.0 addresses over 194 JIRA issues and delivers many new features, fixes and enhancements. Among these improvements, the following features are notable:
This blog provides an overview of the new features and how they integrate with other Hadoop services, as well as provides a preview of focus areas that the community has planned for upcoming releases.
Administrators can now use Apache Ranger’s centralized platform to manage access policies for Solr (collection level), Kafka (topic level) and YARN (capacity schedule queues). The centralized authorization and auditing capability add into what was previously available for HDFS, HBase, Hive, Knox and Storm. As a precursor to this release, Hortonworks security team worked closely with the community to build authentication support (Kerberos) and authorization APIs in Apache Solr and Apache Kafka.
Administrators can now apply security policies to protect queues in Kafka and ensure authorized users are able to submit or consume from a Kafka topic. Similarly, Ranger can be used to control query access at Solr collection level, ensuring sensitive data in Apache Solr is secured in production environments. Apache Ranger’s integration with YARN RM enables administrators to control which applications can submit to a queue and prevent rogue applications from using YARN.
In this release, HDP takes a major step forward in meeting enterprises’ requirements for security and compliance by introducing transparent data encryption for encrypting data for HDFS files, combined with a Ranger embedded open source Hadoop KMS. Ranger now provides security administrators the ability to manage keys and authorization policies for KMS.
This encryption feature in HDFS, combined with KMS access policies maintained by Ranger, prevents rogue Linux or Hadoop administrators from accessing data and supports segregation of duties for both data access and encryption. You can find more details on TDE through this blog.
As enterprises’ Hadoop deployments mature, there is a need to move from static role- based access control to access-based on dynamic rules. An example, would be to provide access based on time of the day (9am to 5pm), or geo (access only if logged in from a particular location) or even data values.
In Apache Ranger 0.5.0, community took the first step to move towards a true ABAC (attribute based access control) model by introducing hooks to manage dynamic policies, thereby providing a framework for users to control access based on dynamic rules. Users can now specify their own conditions and rules (similar to a UDF) as part of service definitions, and these conditions can vary by service (HDFS, Hive etc). In the future, based on community feedback, Apache Ranger might include some of the conditions out of the box.
Apache Ranger 0.5.0 provides the ability to protect metadata listing in Hive based on underlying permissions. This functionality is especially relevant for multi tenant environments where users cannot view other tenants’ metadata (tables, columns).
The following commands related to Hive metadata will now provide relevant information only based on user privileges.
Currently, Apache Ranger UI provides the ability to perform interactive queries against audit data stored in RDBMS. In this release, we are introducing support for storing and querying audit data in Solr. This functionality removes dependency on database for audit and provides users with visibility into Solr data using dashboards built on banana UI. We recommended that users enable audit writing for both Solr and HDFS, and purge data in Solr at regular intervals.
Auditing all events or jobs in Hadoop generate high volume of audit data. Apache Ranger 0.5.0 provides the ability to summarize audit data at the source for given time period, by user, resource accessed and action, thereby reducing audit data volume and noise and impact on underlying storage for improved performance.
As part of this release, the Ranger community worked extensively to revamp the Apache Ranger architecture. As a result of this effort, Apache Ranger 0.5.0 now provides a pluggable architecture for policy administration and enforcement. Using a “single pane of glass,” end-users can configure and manage their security across all components of their Hadoop stack and extend it to their entire big data environment.
Apache Ranger 0.5.0 enables customers and partners to easily add a new “service” to support a new component or data engine. Based on JSON, this service is configurable.
Users can create custom service as plug-in to any data store, build and manage services centrally for their big data BI applications.
The Apache Ranger release would not have been possible without contributions from the dedicated community members who have done a great job understanding the needs of the user community and delivering them. Based on demand from the user community, we will continue to focus our efforts in three primary areas: