Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop

Follow up on Apache Falcon for data governance in hadoop

On Wednesday May 21, Himanshu Bari (Hortonworks’ senior product manager), Venkatesh Seetharam (committer to Apache Falcon), and Justin Sears ( Hortonworks’ Product Marketing Manager), hosted the third of our seven Discover HDP 2.1 webinars. Himanshu and Venkatesh discussed data governance in Hadoop through Apache Falcon that is included in HDP 2.1. As most of you know, ingesting data into Hadoop is one thing; having data governed, by dictating and defining data-pipeline policies, is another thing—a necessity in the enterprise.

In this informative discourse, the speakers explored and discussed:

  • Why you need Apache Falcon
  • What are some key new Falcon features
  • Showed a Demo highlighting how to:
    • define  data pipelines with replication
    • declare policies for retention and late data arrival
    • manage Falcon server with Ambari
  • Answered questions.

If you missed the webinar, here is the complete recording of the webinar.

And here is the presentation deck.

Webinar Q & A

Question Answer
What version of HDP is Falcon supported in? We recently shipped HDP 2.1, and Apache Falcon is part of that GA release.
Can you use Falcon UI to manage Falcon entities and pipelines? Today, the Falcon UI is read-only. You cannot edit it. But it’s something we are working on, and it’ll be available soon.
Amabari is not supported on Ubuntu yet (AFAIK), what about Falcon? You are correct, Ambari support on Ubuntu is in the works, but Falcon already comes with debs and you could install it outside of Ambari. Note that HDP is supported on Ubuntu today; however, Ambari will have Ubuntu support in the near future.
How do I manage a Falcon server without Ambari today? Should I use Falcon UI? You cannot manage Falcon nor monitor Falcon today in Ambari on Ubuntu. You have a minimal dashboard for Falcon to monitor the jobs. But eventually, you will soon be able to create and manage the pipelines in the UI
Do we have a UI for all this configuration? We are working on the UI to enhance configuration management.
For this demo, are you using Ambari 1.5.1? Yes. We showed Ambari 1.5.1.
Does Apache Falcon run on earlier versions of HDP too?  Like, HDP 1.3, by any chance? That is not a supported config.

What’s Next?

Visit our Data Governance and Integration labs and Apache Falcon page to learn more.

Attend our next Discover HDP 2.1 webinar on Wednesday, May 28 at 9 am Pacific Time: Apache Hadoop 2.4.0., YARN, and HDFS.

And if you have any further questions pertaining to Apache Falcon—documentation, code examples, tutorials—please post them on the Community forums under Falcon.

Categorized by :
Ambari Falcon Operations & Management

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.