Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
Apache Projects
Apache Atlas

Apache Atlas

MENU

OVERVIEW

Agile enterprise compliance through metadata

Atlas is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements

 

What Atlas Does

Screen Shot 2016-09-06 at 4.30.46 PM

Apache Atlas provides scalable governance for Enterprise Hadoop that is driven by metadata. Atlas, at its core, is designed to easily model new business processes and data assets with agility. This flexible type system allows exchange of metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements

Apache Atlas is developed around two guiding principles:

  • Metadata Truth in Hadoop: Atlas provides true visibility in Hadoop. By using native connector to Hadoop components, Atlas provides technical and operational tracking enriched by business taxonomical metadata. Atlas facilitates easy exchange of metadata by enabling any metadata consumer to share a common metadata store that facilitates interoperability across many metadata producers.
  • Developed in the Open: Engineers from Aetna, Merck, SAS, Schlumberger, and Target are working together to help ensure Atlas is purposely built to solve real data governance problems across a wide range of industries that use Hadoop. This approach is an example of open source community innovation that helps accelerate product maturity and time-to-value for the data-first enterprise.

Apache Atlas empowers enterprises to effectively and efficiently address their compliance requirements through a scalable set of core governance services. These services include:

  • Data Lineage: Captures lineage across Hadoop components at platform level
  • Agile Data Modeling: Type system allows custom metadata structures in a hierarchy taxonomy
  • REST API: Modern, flexible access to Atlas services, HDP components, UI & external tools
  • Metadata Exchange: Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems

 

 

 

How Atlas Works

Apache Atlas is designed to effectively exchange metadata within Hadoop and the broader data ecosystem. Atlas’s adaptive model reduces enterprise time to compliance by leveraging existing metadata and industry-specific taxonomy. With Atlas, data administrators and stewards also have the ability to define, annotate and automate the capture of relationships between data sets and underlying elements including source, target and derivation processes.

Atlas also ensures downstream metadata consistency across the ecosystem by enabling enterprises to easily export metadata to third-party systems.

 

atlas_architecture

Technical Preview

Business Taxonomy (Catalog)

Big Data brings democratization of information access and eases how information can be shared across the enterprise. However, unplanned growth can result in ‘data swamps’ with content that is not tagged or cataloged adequately. Business taxonomies can provide the missing link in closing this gap. From the Greek, ‘taxis’, meaning ‘order’ and ‘arrangement’, taxonomies use a hierarchy of terms to classify and arrange concepts or physical/ logical objects making them the ideal vehicle to capture the structure of the entire domain of an enterprise’s content.

Consistent classification and tagging across the enterprise using taxonomies supports system/ platform interoperability and value generation from structured and unstructured data sources by mapping them to common shared vocabulary. This authoritative reference taxonomy improves both data confidence and time to insight.

Requirements for a Big Data Business Catalog

  • Purpose-Built Platform Solution: In order to make sense of big data and provide users with the ability to find the right information, enterprises need a data governance solution that is designed for Hadoop and operates at the platform level, so that it consistently classifies data across all the engines used by the organization to move and analyze data.
  • A purpose-built platform solution can serve as the single source of metadata truth in Hadoop by automatically tracking multi-user, multi-application activity in Hadoop components with native connectors, whereas data governance solutions that operate at the application level require a single proprietary solution path which ends up proliferating data silos.
  • Faster Data Discovery: The business catalog enables data officers and stewards to search for data and metadata quickly and in a number of different ways to reduce time to value. This includes the ability to search by:
    • Asset Type: Search for a Hive table, Storm Topology or any connected component.
    • Tags: Search for all columns or tables that have a specific tag such as PII
    • Business Language: Aligned with compliance standards & policies

The combination of these search capabilities empowers data stewards to construct a model of their organization and how it conducts business. These includes the ability to model a business by combining both logical and physical data entities to develop a more complete understanding.

What's New in HDP 2.6

Cloud

  • Shared enterprise services for governance

Component Coverage

  • Tag-based policy support for HDFS, Kafka and HBase
  • Knox SSO for Atlas UI

Ease of Use

  • API revamp
  • Simplified UI for basic search
  • Manual entity creation – support for HDFS, HBase, Kafka & custom entity types etc.
  • Performance and scalability improvements
  • SmartSense metrics

Recent Progress with Atlas

The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.

Latest release of Apache Atlas has focused on delivering scalable metadata services to model any business process enhanced with industry-specific terminology, as well as the ability to import and export metadata from other systems and tools.

Apache Atlas Version Progress
Apache Atlas 0.7
  • Enterprise deployment
    • Performance enhancements
    • HA, DR and BC support
    • AD integration
  • Component lineage
    • Kafka/ Storm
    • Scoop
    • Falcon
  • Security
    • Support for Kerberos
    • Atlas/ Ranger integration for dynamic tag-based security
  • User Interface
    • Improved GUI
    • Business catalog (Technical Preview)
  • Governance-ready partner ecosystem
 Apache Atlas 0.6
  • Built-in types for HDFS
  • Metadata tag management
  • Expanded support for Apache Hive
Apache Atlas 0.5
  • Scalable metadata service
    • Enterprise/Business unit level modeling with industry-specific vocabulary
    • Extend visibility into HDFS Path, Hive DB, table, columns
    • Flexible access to Atlas services
  • Hive integration leverages existing metadata
    • Leverage existing metadata with import / export capability
    • Capture SQL runtime metrics directly
  • UI driven Hive table lineage and domain-specific search
    • Support for keyword, faceted and free text searches

Governance Ready Certification

Screen Shot 2016-09-07 at 4.11.40 PM

To address enterprise requirements for Hadoop application integration, Atlas strives to foster a vibrant ecosystem based on a centralized metadata store. The Governance Ready program aims to create a curated group of partners that contribute a rich set of data management features focusing on data preparation, integration, cleansing, tagging, ETL visualization and collaboration areas.

 

Certified partners will help define a set of standards to exchange metadata and contribute conforming data integration features to the metadata store. Customers can then subscribe to desired features with low switching costs and faster deployment time.

Forums

Atlas Tutorials

Atlas in our Blog

Webinars & Presentations