Project Falcon: Tackling Hadoop Data Lifecycle Management via Community Driven Open Source

Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

All About Falcon and Data Lifecycle Management

Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.

Falcon at a glance

Falcon’s goal is to simplify data management on Hadoop and achieves this by providing important data lifecycle management services that any Hadoop application can rely on. Instead of hard-coding complex data lifecycle capabilities, apps can now rely on a proven, well-tested and extremely scalable data management system built specifically for the unique capabilities that Hadoop offers.

For example consider the challenge of preparing raw data such that it can be consumed by business intelligence applications. In addition to this routine use case suppose you also want to replicate data to a failover cluster that is smaller than the primary cluster. In this case you probably only want to replicate the staged data as well as the data presented to BI applications, relying on the primary cluster to be the sole source of intermediate data.

falcon2

 We see our customers building solutions like this but they are very tricky to develop, difficult to test and error-prone. With Falcon however, the data processing pipeline and all replication points are expressed in a single configuration file and use well-tested Falcon services to ensure data is processed and replicated reliably. Using Falcon you speed app development with greater overall quality.

The Power of Community

Unwavering belief in the power of community-driven open source software is the cornerstone of Hortonworks’ approach.  As we discussed in “The Road Ahead for Hortonworks and Hadoop”, one of the key areas of investment for enterprise Hadoop this year are features that address the business continuity and data governance needs of the mainstream enterprise.

New but proven

The team at InMobi (who have been significant contributors to Apache Hadoop since their inception) couldn’t agree more, which is why they built Falcon for their own usage almost 18 months ago.  And now, having proved it successfully at scale in their production environment for more than 12 months and managing hundreds of data feeds into and out of Hadoop, this technology has now been contributed to the Apache Software Foundation so that the entire community may benefit.

We are thrilled to welcome Falcon as yet another example of the relentless march of innovation that is community driven, open source Apache Hadoop, and hope you join us on the journey.

Interested in learning more? Additional details of the project are on the InMobi blog. Please visit the Falcon incubator website and get involved!

Categorized by :
Apache Hadoop Falcon Hadoop Ecosystem

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.