cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
June 10, 2015
prev slideNext slide

New in HDP 2.3: Enterprise Grade HDFS Data At Rest Encryption

Apache Hadoop has emerged as a critical data platform to deliver business insights hidden in big data. As a relatively new technology, system administrators hold Hadoop to higher security standards. There are several reasons for this scrutiny:

  • External ecosystem that comprise of data repositories and operational systems that feed Hadoop deployments are highly dynamic and can introduce new security threats on a regular basis.
  • Hadoop deployment contains large volume of diverse data stored over longer periods of time. Any breach of this enterprise-wide data can be catastrophic.
  • Hadoop enables users across multiple business units to access, refine, explore and enrich data using different methods, thereby raising the risk for potential breach.

Security Pillars in Hortonworks Data Platform (HDP)

HDP is the only Hadoop platform offering comprehensive security and centralized administration of security policies across the entire stack. At Hortonworks we take a holistic view to enterprise security requirements and ensure that Hadoop can not only define but also apply a comprehensive policy. HDP leverages Apache Ranger for centralized security administration, authorization and auditing; Kerberos and Apache Knox for authentication and perimeter security, and support for native/partner solutions for encrypting over the wire and data-at-rest.

hdf_sec_1

Data at REST Encryption – State of the union

In addition to authentication and access control, data protection adds a robust layer of security, by making data unreadable in transit over the network or at rest on a disk.

Compliance regulations, such as HIPAA and PCI, stipulate that encryption is used to protect sensitive patient information and credit card data. Federal agencies and enterprises in compliance driven industries, such as healthcare, financial services and telecom, leverage data at rest encryption as core part of their data protection strategy. Encryption helps protect sensitive data, in case of an external breach or unauthorized access by privileged users.

There are several encryption methods, varying in degrees of protection. Disk or OS level encryption is the most basic version, which protects against stolen disks. Application level encryption, on the other hand, provides higher level of granularity and prevents rogue admin access; however, it adds a layer of complexity to the architecture.

Traditional Hadoop users have been using disk encryption methods such as dm-crypt as their choice for data protection. Although OS level encryption is transparent to Hadoop, it adds a performance overhead and does not prevent admin users from accessing sensitive data. Hadoop users are now looking to identify and encrypt only sensitive data, a requirement that involves delivering finer grain encryption at the data level.

Certifying HDFS Encryption

The HDFS community worked together to build and introduce transparent data encryption in HDFS. The goal was to encrypt specific HDFS files by writing them to HDFS directories known as encryption zones (EZ). The solution is transparent to applications leveraging HDFS file system, such as Apache Hive and Apache HBase. In other words, there is no major code change required for existing applications already running on top of HDFS. One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.

As with any other Hadoop security initiative, we have adopted a phased approach of introducing this feature to customers running HDFS in production environment. After the technical preview announcement earlier this year, Hortonworks team has worked with select group of customers to gather use cases and perform extensive testing against those use cases. We have also devoted significant development effort in building a secure key storage in Ranger, by leveraging the open source Hadoop KMS. Ranger now provides centralized policy administration, key management and auditing for HDFS encryption.

We believe that HDFS encryption, backed by Ranger KMS, is now enterprise ready for specific use cases. We will introduce support for these use cases as part of the HDP 2.3 release.

HDFS encryption in HDP – Components and Scope

hdfs_sec_2

The HDFS encryption solution consists of 3 components (more details in the Apache website here)

  • HDFS encryption/decryption enforcement: HDFS client level encryption and decryption for files within an Encryption Zone
  • Key provider API: API used by HDFS client to interact with KMS and retrieve keys
  • Ranger KMS: The open source Hadoop KMS is a proxy that retrieves keys for a client. Working with the community, we have enhanced Ranger GUI to enable securely store key using a database and centralize policy administration and auditing. (Please refer to the screenshots below)

hdfs_sec_3

 

hdfs_sec_4

 

We have  extensively tested HDFS data at rest encryption across the HDP stack and will provide a detailed set of best practices for how to use HDFS data at rest encryption among various use cases as part of the HDP 2.3 release.

We are also working with key encryption partners so that they can integrate their own enterprise ready KMS offerings with HDFS encryption. This offers a broader choice to customers looking to encrypt their data in Hadoop.

Summary

In summary, to encrypt sensitive data, protect privileged access and go beyond OS level encryption, enterprise can now use HDFS transparent encryption. Both HDFS encryption and Ranger’s KMS are open source, enterprise-ready, and satisfy compliance sensitive requirements. As such they facilitate Hadoop adoption among compliant conscious enterprises.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>