Agile enterprise compliance through metadata
Atlas is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements
Apache Atlas provides scalable governance for Enterprise Hadoop that is driven by metadata. Atlas, at its core, is designed to easily model new business processes and data assets with agility. This flexible type system allows exchange of metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements
Apache Atlas is developed around two guiding principles:
Apache Atlas empowers enterprises to effectively and efficiently address their compliance requirements through a scalable set of core governance services. These services include:
Apache Atlas is designed to effectively exchange metadata within Hadoop and the broader data ecosystem. Atlas’s adaptive model reduces enterprise time to compliance by leveraging existing metadata and industry-specific taxonomy. With Atlas, data administrators and stewards also have the ability to define, annotate and automate the capture of relationships between data sets and underlying elements including source, target and derivation processes.
Atlas also ensures downstream metadata consistency across the ecosystem by enabling enterprises to easily export metadata to third-party systems.
Big Data brings democratization of information access and eases how information can be shared across the enterprise. However, unplanned growth can result in ‘data swamps’ with content that is not tagged or cataloged adequately. Business taxonomies can provide the missing link in closing this gap. From the Greek, ‘taxis’, meaning ‘order’ and ‘arrangement’, taxonomies use a hierarchy of terms to classify and arrange concepts or physical/ logical objects making them the ideal vehicle to capture the structure of the entire domain of an enterprise’s content.
Consistent classification and tagging across the enterprise using taxonomies supports system/ platform interoperability and value generation from structured and unstructured data sources by mapping them to common shared vocabulary. This authoritative reference taxonomy improves both data confidence and time to insight.
Requirements for a Big Data Business Catalog
The combination of these search capabilities empowers data stewards to construct a model of their organization and how it conducts business. These includes the ability to model a business by combining both logical and physical data entities to develop a more complete understanding.
The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.
Latest release of Apache Atlas has focused on delivering scalable metadata services to model any business process enhanced with industry-specific terminology, as well as the ability to import and export metadata from other systems and tools.
|Apache Atlas Version||Progress|
|Apache Atlas 0.7||
|Apache Atlas 0.6||
|Apache Atlas 0.5||
To address enterprise requirements for Hadoop application integration, Atlas strives to foster a vibrant ecosystem based on a centralized metadata store. The Governance Ready program aims to create a curated group of partners that contribute a rich set of data management features focusing on data preparation, integration, cleansing, tagging, ETL visualization and collaboration areas.
Certified partners will help define a set of standards to exchange metadata and contribute conforming data integration features to the metadata store. Customers can then subscribe to desired features with low switching costs and faster deployment time.