Comprehensive and Coordinated Security for Enterprise Hadoop

Central administration and coordinated enforcement of security policy

With the emergence of Hadoop as a business-critical data platform, more stringent requirements for data security are now being required by the enterprise. Hadoop already includes many of these security requirements, but the work is not done.

Hortonworks has already contributed a wide range of security functions, including Kerberos within Apache Hadoop, GRANT/REVOKE commands in Apache Hive and the Apache Knox project for perimeter security among many other features.  Recently, we purchased XA Secure to extend this already rich set of features with central administration and coordinated enforcement of security policy across the entire Hadoop ecosystem of projects. And as part of our promise to keep HDP completely open, we will incubate the XA Secure functionality as a project governed by the Apache Software Foundation.

With the open source community, we will continue to pursue three security goals.

Comprehensive Security
Meet all security requirements across authentication, authorization, audit & data protection for all HDP components.
Central Administration
Provide one location for administering security policies and for viewing and managing audit across the platform.
Consistent Integration
Integrate with other security and identity management systems, for compliance with IT policies.

Already Delivered

Through individual Apache projects and the acquisition of XA Secure, Hortonworks has already delivered key pieces of the security roadmap in five areas.

Centralized Security Administration
Security best practices should be consistently applied across the platform, and they should be managed centrally with a single user interface. With XA Secure, HDP Advanced Security now provides a security administration console that is unique to HDP but will be delivered completely in the open for all. Now Hadoop administrators can easily manage all security policies related to access control, in one place.

HDP Advanced Security provides support for Kerberos-based authentication. Kerberos can be connected to corporate LDAP environments to centrally provision user information. HDP also provides perimeter authentication through Apache Knox for REST APIs and Web services.

Authorization or entitlement is the process of ensuring that users have access only to data as per corporate policies. Hadoop already provides fine-grained authorization via file permissions in HDFS, resource-level access control for YARN and MapReduce, and coarser-grained access control at a service level. HBase provides authorization with ACL on tables and column families, while Accumulo extends this further to cell-level control. Apache Hive provides Grant/Revoke access control on tables.

With the addition of XA Secure, Hadoop now includes authorization features that help enterprises securely use varied data with multiple user groups while ensuring proper entitlements. It provides an intuitive way for users to specify entitlements policies for HDFS, HBase, and Hive with a centralized administration interface and extended authorization enforcement. Our goal is to provide a common authorization framework for the HDP platform, providing security administrators with a single administrative console to manage all the authorization policies for HDP components.

One of the cornerstones for any security system is accountability, or having audit data for auditors to control the system and check for regulatory compliance, for example in Healthcare around HIPAA compliance. Healthcare providers would look within audit data for access history for sensitive data such as patient records, and provide the data if requested by patient or any regulatory authority. Having a robust audit data would help enterprises manage their regulatory compliance needs better as well as control the environment proactively. XA Secure provides a centralized framework for collecting access audit history and easy reporting on the data. The data can be filtered based on various parameters. Our goal is to enhance the audit information that is captured within various components within Hadoop and provide insights through the centralized reporting.

Data protection
Data protection involves protecting data at rest and in motion, including encryption and masking. Encryption provides an added layer of security by protecting data when it is transferred and when it is stored (at rest), while masking capabilities enable security administrators to desensitize PII for display or temporary storage. We will continue to leverage the existing capabilities in HDP for encrypting data in flight, while bringing forward partner solutions for encrypting data at rest, data discovery, and data masking.

Coming Next

The next phase on the Hadoop security roadmap will deliver:

  • Encryption in HDFS, Hive & HBase
  • Centralized security administration for all Hadoop components
  • Expand audit to cover more operations and provide audit correlation
  • Offer additional SSO integration choices
  • Provide alternative to Kerberos-based authentication

Now Available!

Essential Timeline

Previous Phases
  • Kerberos Authentication
  • HBase, Hive & HDFS authorization
  • Wire Encryption for HDFS, Shuffle & JDBC
  • Basic audit in HDFS & MR
  • ACLs for HDFS
  • Knox: Hadoop REST API Security
  • SQL-style Hive Authorization
  • Expanded Wire Encryption for HiveServer2 & WebHDFS
XA Secure Phase
  • Centralized Security Administration for HDFS, HBase & Hive
  • Centralized Audit Reporting
  • Delegated Policy Administration
XA Secure
Future Phases
  • Encryption in HDFS, Hive & HBase
  • Centralized security administration for all Hadoop components
  • Expand audit to cover more operations and provide audit correlation
  • Offer additional SSO integration choices
  • Tag-based global policies


Try these Tutorials

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.