Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 12, 2014
prev slideNext slide

Protegrity: Data Security in Hadoop and Beyond

This is a guest blog from Protegrity, a Hortonworks certified partner.

As Hadoop transitions to take on a more mission critical role within the data center, so the top IT imperatives of process innovation, operational efficiency, and data security naturally follow. One such imperative in particular now tops the requirement list for Hadoop consideration within the enterprise: a well-developed framework to secure data.

Screen Shot 2014-08-11 at 1.07.49 PMThe open source community has responded. Work is underway to build out a comprehensive and coordinated security framework for Hadoop that can work well with existing IT security investments. While no general standard has been set for Hadoop security, the Apache Argus incubator project, together with Apache Knox and Kerberos, can provide adequate role-based access control, authentication, monitoring and administration for the Hadoop ecosystem, and create a basic data security framework on which to build.

Protegrity extends consistent data security within and beyond Hadoop

Protegrity builds on this baseline by utilizing the shared role-based data security controls, and providing a single, more extensive data security framework that can protect and monitor sensitive and proprietary data across multiple environments in the enterprise, including all major distributions of Hadoop. Protegrity’s well-established file and field level data encryption and tokenization technology can be employed within these environments, creating a seamless network of data-centric security far stronger than access controls alone. The use of Protegrity Vaultless Tokenization (PVT) to de-identify rather than encrypt sensitive data can increase utility of secured data, and maximize analytical and operational efficiency across platforms. In addition, the use of PVT can solve data residency and governance issues, as well as address privacy and cross-border compliance, which access controls alone cannot.

Other incorporated Hadoop security solutions don’t include support for field/cell level data security, let alone provide the unmatched flexibility of PVT for secure data utility. And no other data security provider provides Protegrity’s level of comprehensive, integrated security for data flowing between Hadoop and other enterprise environments.

This combination of a shared enterprise data security framework with a high-utility protection method is ideal for modern IT architectures, where data is shared across multiple environments, often between geographically disparate data centers, and analyzed at different levels, by different roles along the way. In the case of the Teradata Unified Data Architecture (UDA), data is protected throughout the entire data flow, inside Hadoop distributions such as the Hortonworks Data Platform (HDP) stack, through to data analysis platforms, such as Teradata Aster, and beyond to detailed analysis in databases and EDWs, such as Teradata Database.

Protegrity data security throughout the Teradata Unified Data Architecture


Protegrity also provides protection for other environments within the modern data architecture, including ETL tools, and data tools such as Informatica, which can provide a central hub to protect data throughout the enterprise. Protegrity Gateway appliances can be used to protect data before it leaves the enterprise to cloud-based big data deployments or applications, or for third party use.

It takes an army…

As Hadoop adoption expands and as the enterprise moves to a modern data architecture, and data flows freely between environments for various levels of use and analysis, so must security for data at rest and in motion become a unified endeavor. Data protection at the point at which data is collected and throughout its life inside the enterprise must be ensured. A combination of tools such as Apache Argus, along with the centrally-controlled and unified Protegrity Data Security Platform, provides the most efficient means of both protecting and identifying risks to data in Hadoop and throughout the enterprise data flow.


Leave a Reply

Your email address will not be published. Required fields are marked *