Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
April 20, 2015
prev slideNext slide

Find, Understand, and Govern Data in Hadoop

Waterline Data is a Hortonworks Technology Partner and recently earned HDP Certification and YARN Ready with their solution that automates the inventory of data assets in the data lake, enables data governance, and provides self-service to data engineers and data scientists to find and understand their data. Learn more by joining the upcoming webinar on May 6, download the Sandbox tutorial or joint whitepaper. Our guest blogger is Oliver Claude, CMO at Waterline Data.

Apache Hadoop promises to unlock new business value for enterprises. Hadoop provides a powerful platform for data science and analytics, where data engineers and data scientists can leverage myriad data from external and internal data sources to uncover new insight. Data stored in Hadoop is available via a centralized architecture allowing access from any application and for any user. This type of deployment is often called a data lake.

Such power is also presenting a few new challenges, in particular as data lakes grow. On the one hand, the business wants more and more self-service, and on the other hand IT is trying to keep up with the demand for data, while maintaining architecture and data governance standards. In other words, there is a need to combine self-service with automation and governance.

Automation and Governance

The metaphor that comes to mind that illustrates such a solution is

waterline_1 is supported by a complete and automated inventory and catalog of all the products. also makes it very easy for users to find, understand, and get the products they want. Lastly, there is end-to-end governance to ensure accurate product information and secure transactions.

At Waterline Data, inspired us, and we built a product that is like for data in Hadoop and the Hortonworks Data Platform (HDP).


Waterline Data provides a unique combination of automation and machine learning in order to

  1. Automatically inventory every file and field in the entire data lake
  2. Let data engineers and data scientists find and understand the best suited and most trusted data without having to explore each file manually
  3. Provision the data securely
  4. Enable data governance throughout including the discovery of data lineage, compliance metadata, and business metadata.

Waterline Data is HDP Certified

Waterline Data also invested in optimizing the product with the Hortonworks Data Platform, and is an HDP Certified Technology Partner. As a result, Waterline Data running on HDP helps turn the data lake into a business-ready data lake, and prevents a data swamp from forming.

Hortonworks Sandbox

You can get hands-on with Waterline Data Inventory over a Hortonworks cluster, by downloading the Waterline on Hortonworks Sandbox and tutorial to find, understand, and govern data in Hadoop.

Download the Sandbox Tutorial with Waterline Data: Manage Your Data Lake More Efficiently.

Join the Webinar

Waterline Data and Hortonworks host an upcoming webinar on May 6 at 10 am PT “Implementing a Data Lake with Enterprise Grade Data Governance.” Register Here.

Learn More


Leave a Reply

Your email address will not be published. Required fields are marked *