Security for Enterprise Hadoop
Hadoop has become a business-critical data platform at many of the world’s largest enterprises. These corporations require four layers of security: authorization, authentication, accounting and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community.
Securing a Hadoop Cluster
Authentication verifies the identity of a system or user accessing the cluster
Hadoop provides two modes of authentication: simple authentication and Kerberos authentication.
Hadoop provides these capabilities while relying on widely accepted corporate user-stores (such as LDAP or Active Directory) so that a single source can be used for credential catalog across Hadoop.
Authorization specifies access privileges for a user or system
Knox Gateway 0.4.0 introduces the features that enterprise security officers expect for perimeter security of a Hadoop cluster. Knox includes support for lookup of enterprise group permissions and also introduction of service-level access control. It adds protection from vulnerabilities in web applications and also a pluggable auditing facility.
The various Apache projects in a Hadoop distribution also include their own access control features. HDFS has file permissions for fine-grained authorization. MapReduce includes resource-level access control via ACL. For data, Apache HBase provides authorization with ACL on tables and column families and Apache Accumulo extends this further for cell-level access control. Also, Apache Hive provides coarse-grained access control on tables.
Accounting tracks resource use within a Hadoop system
For security compliance or forensics, insight into historical data access events is critical. HDFS and MapReduce provide base audit support. Apache Hive metastore records audit who interacts with Hive and when such interactions occur. Finally, Apache Oozie, the workflow engine, provides an audit trail for services.
Data protection ensures privacy and confidentiality of information
Hadoop and HDP allow you to protect data in motion. HDP provides encryption capability for various channels such as Remote Procedure Call (RPC), HTTP, JDBC/ODBC, and Data Transfer Protocol (DTP) to protect data in motion. HDFS and Hadoop support encryption at the operating system level.
Owen O’Malley, Deveraj Das, and Sanjay Radia co-wrote the original Hadoop security specification in 2011. Since then, Hortonworks developers and coders from the open community have delivered core Kerberos functions in Hadoop (and then augmented this work with delegation tokens, capability-like access tokens and the notion of trust for auxiliary services.)
Continuing this leadership, the team at Hortonworks incubated the Apache Knox Gateway project in February 2013 to create a security perimeter for REST/HTTP access to Hadoop. Apache Knox version 0.4.0 will ship as a fully supported and certified component of HDP 2.1.
Phase 3 of the Hadoop security roadmap will deliver:
- Strong authentication via Kerberos
- HBase, Hive, HDFS basic auth
- Encryption with SSL for NameNode, JobTracker, etc
- Wire encryption for Shuffle, HDFS & JDBC
- ACLs for HDFS
- Knox: Hadoop REST API Security
- SQL-style Hive Authorization
- SSL support for HiveServer2
- SSL for DN/NN UI & WebHDFS
- PAM Support for Hive
Knox 0.4.0(HDP 2.1)
- Audit Event Correlation & Audit Viewer
- Support Token-based Authentication Beyond Kerberos
- Data Encryption in HDFS, HBase & Hive
- Knox for HDFS HA, Ambari & Falcon