The modern enterprise requires a comprehensive end-to-end data management solution capable of leveraging advanced machine learning to identify and manage risk; as well as a repository capable of capturing and processing the data necessary to support this solution.
Now more than ever, organizations are subject to privacy and data security laws and complying with these regulations is exceptionally challenging given the complexity of data that enterprises now have to manage. However, one only needs to pick up a newspaper to read about the dire consequences when companies fail to take proper safeguards to comply with privacy and data security laws. These consequences may include:
There are a number of broad federal regulations that have been written to protect the consumers from unfair or deceptive practices. In this regard, the Federal Trade Commission Act (FTC Act) addresses the collection, processing, use, and disclosure of what is known as personally identifiable information (PII). The definition of PII varies by state and country. Other laws apply to specific business sectors such as the Gramm-Leach-Bliley Act (GLBA), which regulates financial services sector, or HIPAA, as amended by the Health Information Technology for Economic and Clinical Health Act (HITECH), regulates protected health information (PHI). The Electronic Communications Privacy Act (ECPA) oversees all digital communications. The Federal Rules of Civil Procedure govern digital discovery in federal civil litigation.
FTC Act has important implications for how data is stored, transmitted and protected within the enterprise As part of this regulation, FTC has taken action, for deceptive practices, against organizations that:
Not only companies must comply with the regulations in their respective industries, but they also need to provide proof of compliance for governance and auditing purposes. Keeping up with various regulations that apply to the wide variety of data that the modern enterprise now has its disposal is extremely difficult and without the next generation enterprise data platform this task is almost impossible.
Hortonworks Data Platform and Data Flow (HDP & HDF) provide the modern enterprise with a comprehensive end-to-end data management solution that leverages advanced machine learning (ML) functionality to identify and manage risk and a scalable flexible schema data repository.
HDP enables the deployment of 100% Open Enterprise Hadoop. With the highest number of committers, Hortonworks has the capacity to drive enterprise readiness requirements into the Open Community. This empowers the adoption of the latest innovations that comes out of the Apache Software Foundation and key Apache projects.
HDP is comprised of five functional areas: Data Management, Data Access, Data Governance and Integration, Security, and Operations.
Figure 1 Hortonworks Data Platform
The HDP’s Apache Spark engine exists in the Data Access area and brings in-memory data processing to the scalable Hadoop Distributed File System (HDFS). Comprehensive development APIs in Scala, Java, and Python allow data workers to efficiently process mixed workloads, streaming, machine learning or SQL, on YARN for fast iterative access to datasets.
YARN provides the resources to power the centralized architecture that enables Spark to work with other applications to share a common cluster and dataset while ensuring consistent levels of service and response.
Apache Spark provides a rich ML environment that is comprised of Spark Core and a full set of machine learning libraries. The core is the distributed execution engine and the Java, Scala, and Python APIs empowers the enterprise to build solutions using familiar tools. The ML library provides practical machine learning that is scalable and easy to use. It consists of common learning algorithms and utilities, including dimensionality reduction, classification, regression, clustering, and recommendations.
HDP provides numerous flexible schema data stores to capture various types of data. All of these data stores leverage the highly scalable Hadoop Distributed File System (HDFS). Apache HBase™ provides millisecond latency SQL and NoSQL data stores. HBase is the right tool when random, real-time, and read/write access is needed. Apache Hive™ is the de facto standard for SQL queries on HDFS. Hive tables are similar in structure to tables in a relational database and are comprised of partitions. Data can be accessed via structured query language and Hive also supports overwriting or appending data.
HDP’s governance is designed to exchange metadata with other tools and processes both inside and outside of the Hadoop environment. This enables platform-agnostic governance controls that effectively address privacy and data security laws. HDP addresses data replication, business continuity, and lineage tracing challenges by deploying a framework for managing privacy and data security. It also centrally manages the data lifecycle and provides a foundation for audit and compliance by tracking entity lineage and maintaining audit logs.
As the world generates more data, from any device or social media, companies are increasingly discovering the need to manage privacy and security risk. To manage this risk, the modern data platform must leverage highly secure lightweight agents that can be easily deployed anywhere. Data must be treated as a contínuos data flows—from source to destination—so that modern analytical applications can collect, conduct and curate the data in a secure, scalable and reliable manner.
Hortonworks DataFlow, powered by Apache NiFi, supports the modern enterprise with the following key capabilities:
Comprehensive Data Audit and Provenance
Figure 2 Hortonworks Data Flow
These capabilities are designed to address the unique requirements of the data and security risk in the modern enterprise, and enables data stewards to construct secure and reliable data grids as continuous data flows for on time processing—from anything, from anywhere—at scale. As a result, Hortonworks customers can now securely collect, conduct, and curate any type of “data-in-motion” with HDF as well as securely view and store data at rest with HDP. Together, Hortonworks Data Platform and Hortonworks DataFlow provide both historical and perishable insights protected by industry-leading security.
Only HDP and HDF offer a complete solution for managing privacy and data security in the modern enterprise regardless of the volume, velocity or variety of data.