Apache Hadoop has emerged as a critical data platform to deliver business insights hidden in big data. As a relatively new technology, system administrators hold Hadoop to higher security standards. There are several reasons for this scrutiny:
HDP is the only Hadoop platform offering comprehensive security and centralized administration of security policies across the entire stack. At Hortonworks we take a holistic view to enterprise security requirements and ensure that Hadoop can not only define but also apply a comprehensive policy. HDP leverages Apache Ranger for centralized security administration, authorization and auditing; Kerberos and Apache Knox for authentication and perimeter security, and support for native/partner solutions for encrypting over the wire and data-at-rest.
In addition to authentication and access control, data protection adds a robust layer of security, by making data unreadable in transit over the network or at rest on a disk.
Compliance regulations, such as HIPAA and PCI, stipulate that encryption is used to protect sensitive patient information and credit card data. Federal agencies and enterprises in compliance driven industries, such as healthcare, financial services and telecom, leverage data at rest encryption as core part of their data protection strategy. Encryption helps protect sensitive data, in case of an external breach or unauthorized access by privileged users.
There are several encryption methods, varying in degrees of protection. Disk or OS level encryption is the most basic version, which protects against stolen disks. Application level encryption, on the other hand, provides higher level of granularity and prevents rogue admin access; however, it adds a layer of complexity to the architecture.
Traditional Hadoop users have been using disk encryption methods such as dm-crypt as their choice for data protection. Although OS level encryption is transparent to Hadoop, it adds a performance overhead and does not prevent admin users from accessing sensitive data. Hadoop users are now looking to identify and encrypt only sensitive data, a requirement that involves delivering finer grain encryption at the data level.
The HDFS community worked together to build and introduce transparent data encryption in HDFS. The goal was to encrypt specific HDFS files by writing them to HDFS directories known as encryption zones (EZ). The solution is transparent to applications leveraging HDFS file system, such as Apache Hive and Apache HBase. In other words, there is no major code change required for existing applications already running on top of HDFS. One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.
As with any other Hadoop security initiative, we have adopted a phased approach of introducing this feature to customers running HDFS in production environment. After the technical preview announcement earlier this year, Hortonworks team has worked with select group of customers to gather use cases and perform extensive testing against those use cases. We have also devoted significant development effort in building a secure key storage in Ranger, by leveraging the open source Hadoop KMS. Ranger now provides centralized policy administration, key management and auditing for HDFS encryption.
We believe that HDFS encryption, backed by Ranger KMS, is now enterprise ready for specific use cases. We will introduce support for these use cases as part of the HDP 2.3 release.
The HDFS encryption solution consists of 3 components (more details in the Apache website here)
We have extensively tested HDFS data at rest encryption across the HDP stack and will provide a detailed set of best practices for how to use HDFS data at rest encryption among various use cases as part of the HDP 2.3 release.
We are also working with key encryption partners so that they can integrate their own enterprise ready KMS offerings with HDFS encryption. This offers a broader choice to customers looking to encrypt their data in Hadoop.
In summary, to encrypt sensitive data, protect privileged access and go beyond OS level encryption, enterprise can now use HDFS transparent encryption. Both HDFS encryption and Ranger’s KMS are open source, enterprise-ready, and satisfy compliance sensitive requirements. As such they facilitate Hadoop adoption among compliant conscious enterprises.