Agile enterprise compliance through metadata
Atlas is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements
Apache Atlas provides scalable governance for Enterprise Hadoop that is driven by metadata. Atlas, at its core, is designed to easily model new business processes and data assets with agility. This flexible type system allows exchange of metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements
Apache Atlas is developed around two guiding principles:
Apache Atlas empowers enterprises to effectively and efficiently address their compliance requirements through a scalable set of core governance services. These services include:
Apache Atlas is designed to effectively exchange metadata within Hadoop and the broader data ecosystem. Atlas’s adaptive model reduces enterprise time to compliance by leveraging existing metadata and industry-specific taxonomy. With Atlas, data administrators and stewards also have the ability to define, annotate and automate the capture of relationships between data sets and underlying elements including source, target and derivation processes.
Atlas also ensures downstream metadata consistency across the ecosystem by enabling enterprises to easily export metadata to third-party systems.
Big Data brings democratization of information access and eases how information can be shared across the enterprise. However, unplanned growth can result in ‘data swamps’ with content that is not tagged or cataloged adequately. Business taxonomies can provide the missing link in closing this gap. From the Greek, ‘taxis’, meaning ‘order’ and ‘arrangement’, taxonomies use a hierarchy of terms to classify and arrange concepts or physical/ logical objects making them the ideal vehicle to capture the structure of the entire domain of an enterprise’s content.
Consistent classification and tagging across the enterprise using taxonomies supports system/ platform interoperability and value generation from structured and unstructured data sources by mapping them to common shared vocabulary. This authoritative reference taxonomy improves both data confidence and time to insight.
Requirements for a Big Data Business Catalog
The combination of these search capabilities empowers data stewards to construct a model of their organization and how it conducts business. These includes the ability to model a business by combining both logical and physical data entities to develop a more complete understanding.
Ease of Use
The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.
Latest release of Apache Atlas has focused on delivering scalable metadata services to model any business process enhanced with industry-specific terminology, as well as the ability to import and export metadata from other systems and tools.
|Apache Atlas Version||Progress|
|Apache Atlas 0.7||
|Apache Atlas 0.6||
|Apache Atlas 0.5||
To address enterprise requirements for Hadoop application integration, Atlas strives to foster a vibrant ecosystem based on a centralized metadata store. The Governance Ready program aims to create a curated group of partners that contribute a rich set of data management features focusing on data preparation, integration, cleansing, tagging, ETL visualization and collaboration areas.
Certified partners will help define a set of standards to exchange metadata and contribute conforming data integration features to the metadata store. Customers can then subscribe to desired features with low switching costs and faster deployment time.
Introduction Hadoop has always been associated with BigData, yet the perception is it’s only suitable for high latency, high throughput queries. With the contribution of the community, you can use Hadoop interactively for data exploration and visualization. In this tutorial you’ll learn how to analyze large datasets using Apache Hive LLAP on Amazon Web Services […]
A very common request from many customers is to be able to index text in image files; for example, text in scanned PNG files. In this tutorial we are going to walkthrough how to do this with SOLR. Prerequisites Download the Hortonworks Sandbox Complete the Learning the Ropes of the HDP Sandbox tutorial. Step-by-step guide […]
Introduction In this tutorial, you will learn about the different features available in the HDF sandbox. HDF stands for Hortonworks DataFlow. HDF was built to make processing data-in-motion an easier task while also directing the data from source to the destination. You will learn about quick links to access these tools that way when you […]
Introduction JReport is a embedded BI reporting tool can easily extract and visualize data from the Hortonworks Data Platform 2.3 using the Apache Hive JDBC driver. You can then create reports, dashboards, and data analysis, which can be embedded into your own applications. In this tutorial we are going to walkthrough the folllowing steps to […]
Apache Zeppelin on HDP 2.4.2 Author: Vinay Shukla In March 2016 we delivered the second technical preview of Apache Zeppelin, on HDP 2.4. Meanwhile we and the Zeppelin community have continued to add new features to Zeppelin. These features are now available in the final technical preview of Apache Zeppelin. This technical preview works with […]
The Hortonworks Sandbox is delivered as a Dockerized container with the most common ports already opened and forwarded for you. If you would like to open even more ports, check out this tutorial.
Introduction R is a popular tool for statistics and data analysis. It has rich visualization capabilities and a large collection of libraries that have been developed and maintained by the R developer community. One drawback to R is that it’s designed to run on in-memory data, which makes it unsuitable for large datasets. Spark is […]
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, HAWQ, Zeppelin, Atlas, Slider, Mahout, MapReduce, HDFS, YARN, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.