Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 21, 2014
prev slideNext slide

Hadoop Security in the Enterprise

Zettaset is a Hortonworks partner. In this guest blog, John Armstrong, VP of Marketing at Zettaset Inc., shares Zettaset’s security features and explains why data encryption is vital for data in the Hadoop infrastructure.

Comprehensive Security Across the Hadoop Infrastructure

As big data technologies like Hadoop become widely deployed in production environments, the expectation is that they will meet the enterprise requirements in data governance, operations and security while integrating with existing data center infrastructure.  The technology is not contained within a relatively small, controlled IT environment, but is interfacing with broadly available analytics applications in the business unit.  Data within the Hadoop cluster environment is fluid, and big data is replicated in many places and moves as needed. Security must be consistently applied and enforced across a distributed computing environment.

Enterprises recognize that big data requires a comprehensive and coordinated approach to security, and the open source community with Hortonworks in the lead has embraced this challenge with a number of Apache projects including Apache Knox, Kerberos and recently announce Apache Argus incubator project.

Adrian Lane of Securosis recently penned an excellent article on the differences between traditional databases and big data architectures, and subsequent security challenges, so I needn’t go into detail in this blog. Suffice to say, for organizations handling sensitive data the risks associated with data security and possible non-compliance are too high to ignore.  In the case of a breach, the enterprise will face brand damage control as well as potential impacts on customer confidence and business.  Ideally, the enterprise should be focusing on developing a comprehensive security strategy that includes encryption, fine-grained access control, and security policy enforcement that works well in an environment where data is shared across multiple platforms, Hadoop being one such platform.

Screen Shot 2014-08-11 at 1.07.49 PM

Hortonworks Data Platform (HDP) 2.1  provides a centralized security framework with Apache Argus, Apache Knox and Kerberos to provide authentication, authorization, auditing and administration. However this does not eliminate the need for additional protection against unauthorized access and to support compliance with regulations and mandates such as PCI/DSS, HIPAA, and HITECH.

Encrypting Sensitive Data

Encryption is a highly reliable security method which can be used to protect data-at-rest within the cluster.  Encryption can prevent data exposure even if a server is physically removed from a data center, which is critical for organizations in highly regulated industries such as financial services and healthcare that handle sensitive data.  HIPAA deals with the privacy, security, and transmission of medical information.  The HIPAA Security Rule deals specifically with Electronic Protected Health Information (EPHI), and names addressable and required implementation specifications which include the encryption of a patient’s protected health information.  PCI/DSS imposes similar rules on an individual’s personal and financial information.

A major provider of healthcare services in the U.S. captures terabytes of EPHI on a monthly basis, which is secured and encrypted by Zettaset.  One particular type of information that the customer gathers that fits into the category of unstructured data are physician’s notes, which are typically jotted down on wireless tablets while a doctor is meeting with a patient.  Of course, like all EPHI, this raw data and subsequent information must be strictly secured by law and meet HIPAA privacy requirements. Maintained in a Hadoop database, the healthcare service provider analyzes these notes from thousands of physicians, and derives valuable actionable information.  For example, by correlating the age, sex, and diagnosis of large samples of patients with prescribed care and medication, analysts are able to determine which care regimens deliver the best outcomes for patients with specific ailments.  This information can be used to guide future medical diagnoses and treatments, as well as help the healthcare organization evaluate the efficacy of their medical staff.

Zettaset has developed a KMIP-standard encryption solution, which is compatible with HDP 2.1 (as well as earlier 1.x versions) and other Hadoop and NoSQL databases. Zettaset’s encryption solution is optimized for Hadoop’s distributed architecture, but acknowledges that encryption solutions for centralized RDBMs exist in many organizations as well.  Zettaset takes a standards-based approach that simplifies integration of big data encryption into existing data environments that have a mix of Hadoop and RDBMs, and ensures compatibility with PKCS-compliant hardware security modules (HSMs) that an organization may already have invested in.


There is a tremendous push in the open community, in partnership with leaders in data security like Zettaset, to provide Hadoop with robust security.  The approach addresses the unique architecture of distributed computing and is designed to meet the security requirements of the enterprise data center and the Hadoop cluster environment.  A comprehensive solution will include the best efforts of the open source community in tandem with proprietary data security solutions that can function across multiple platforms including Hadoop.

Tim O’Reilly, a strong proponent of open-source once stated:

Any successful industry provides a balance of open and proprietary. At the heart of the open PC hardware platform is a proprietary CPU, and a variety of proprietary devices. At the heart of the open Internet are proprietary Cisco routers, and for every open source program, there are proprietary ones as well.

The ideal big data security solution will ultimately consist of best-in-breed solutions driven by customer requirements.


Leave a Reply

Your email address will not be published. Required fields are marked *