White Paper: Better Apache Hadoop Security & Data GovernanceDownload
As organizations pursue Hadoop initiatives to capture new opportunities for data-driven insights, data governance and security requirements can pose a key challenge. Hortonworks created Data Governance Initiative to address the need for open source governance solution to manage data classification, data lineage, security and data lifecycle management.
Effective data management and control cannot be passive or simply forensic. Centralized access control powered by consistent data classification is the foundation for dynamic security and is a core requirement for Open Enterprise Hadoop. Towards this goal, Hortonworks is announcing release of new public preview features with Apache Atlas and Apache Ranger, bringing together data classification with security policy enforcement.
Apache Atlas, created as part of the data governance initiative, empowers organizations to apply consistent data classification across the data ecosystem. Apache Ranger provides centralized security administration for Hadoop. By integrating Atlas with Ranger, Hortonworks empowers enterprises to institute dynamic access policies at run time that proactively prevents violations from occurring.
The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.
Hortonworks empowers data managers to ensure the transparency, reproducibility, auditability and consistency of the Data Lake and the assets it contains. Apache Atlas now provides the ability to visualize cross-component lineage, delivering a complete view of data movement across a number of analytic engines such as Apache Storm, Kafka, Falcon and Hive. Data stewards, operations, and compliance personnel now have the ability to visualize a data set’s lineage and then drill down into operational, security and provenance-related details. As this tracking is done at the platform level, any application that uses multiple engines will be natively tracked. This allows for extended visibility beyond a single application view.