Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 11, 2014
prev slideNext slide

Dataguise: Sensitive Data Discovery And Data-Centric Protection

This is a guest post from Hortonworks partner, Dataguise. Dataguise is a HDP 2.1 certified technology partner providing sensitive data discovery, protection and reporting in Hadoop.

Screen Shot 2014-08-11 at 1.07.49 PMAccording to a 2013 Global Data Breach study by the Ponemon Institute, the average cost of data loss exceeds $5.4 million per breach, and the average per person cost of lost data approaching $200 per record in the United States. That said, no industry is spared from this threat and all of our data systems, including Hadoop, need to address the security concern. Protecting sensitive data in Hadoop is now the imperative for IT and the business.

Enterprises are adopting the Modern Data Architecture with Hadoop and Hortonworks Data Platform (HDP) to cost effectively capture, store and process all data, structured and unstructured. And with the introduction of Apache Hadoop YARN, HDP is used to host different data applications and users with access to the same data simultaneously. This underscores the value of the joint HDP and DGSecure solution to provide comprehensive and coordinated security for enterprise Hadoop.

Understanding where the sensitive data is located is crucial to assessing and managing this risk. DGSecure for Hadoop scans your data in structured, semi-structured or unstructured formats and then masks or encrypts your data, providing you with a complete dashboard to track and report all sensitive data protections in your environment. Below are two examples of how customers are realizing the value Hadoop can bring while ensuring compliance and data protection.

Securing a Healthcare Analytics Hadoop Application

Inaccuracies within the healthcare billing system are a major burden and cost. The American Medical Association notes nearly 1 in 10 healthcare bills contain errors and that $43 billion could have been saved if commercial insurers consistently paid claims since 2010. One company helping to improve this statistic is an innovative healthcare analytics organization that combines clinical expertise and analytical technology inside Hadoop to identify and reclaim excessive and inaccurate healthcare charges. They are utilizing Dataguise to discover specific sensitive data elements (in flight and at rest) and to mask and encrypt these elements. These include Protected Health Information (PHI) data such as names, health records, addresses, and billing amounts. Being able to discover and protect specific sensitive elements in Hadoop is one of the unique differentiators with the Dataguise solution. Consistent and flexible masking provides coding accuracy, enabling consistent data bindings between diagnosis and procedure costs. They are then able to leverage Hortonworks Hadoop data platform and a leading analytics solution to validate billing consistency and report on billing inaccuracies, giving it a 99% success rate. The flexible and intelligent masking and encryption provided by Dataguise allows this organization to achieve compliance for both Federal (HIPAA) and 38 State Privacy laws, as well as identify new revenue streams by sharing de-identified medical records with clients, partners and government health agencies in a secure, private HIPAA compliant format.

Leveraging Mobile Phone Usage Logs to Improve User Experience and Enhance Product Features

A global smartphone manufacturer leverages the power of Hadoop to capture and aggregate phone logging data (product, usage and user configuration information). This data is then de-identified using Dataguise DgSecure via the Flume agent. Dataguise encrypts and masks specific sensitive data elements within the larger volume of data, leaving the key product and usage data open for the business users analytical and reporting needs. This ensures compliance with U.S. and European compliance directives and results in a highly scalable, high performance, on demand (and secure) analytics platform that product teams can use to continuously improve the products, add new features and enhance overall user experience.

The Role of Apache Argus

As we see more and more companies turning to Hadoop, we also see security considerations playing a bigger role. The recently announced Apache Argus incubator project provides provides central administration and coordinated enforcement of enterprise security policy for a Hadoop cluster. Dataguise
is planning on working with the Apache Argus community to provide an integrated approach for authorized decryption, wherein Dataguise decryptions can be authorized and controlled centrally from the Argus authorization framework, allowing clients to achieve maximum value and security within the Hadoop deployments.

Try It Today

Visit the Dataguise tutorial to try it in the Hortonwoks Sandbox.


Leave a Reply

Your email address will not be published. Required fields are marked *