Project Falcon: Tackling Hadoop Data Lifecycle Management via Community Driven Open Source
Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.
All About Falcon and Data Lifecycle Management
Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.
Falcon’s goal is to simplify data management on Hadoop and achieves this by providing important data lifecycle management services that any Hadoop application can rely on. Instead of hard-coding complex data lifecycle capabilities, apps can now rely on a proven, well-tested and extremely scalable data management system built specifically for the unique capabilities that Hadoop offers.
For example consider the challenge of preparing raw data such that it can be consumed by business intelligence applications. In addition to this routine use case suppose you also want to replicate data to a failover cluster that is smaller than the primary cluster. In this case you probably only want to replicate the staged data as well as the data presented to BI applications, relying on the primary cluster to be the sole source of intermediate data.
We see our customers building solutions like this but they are very tricky to develop, difficult to test and error-prone. With Falcon however, the data processing pipeline and all replication points are expressed in a single configuration file and use well-tested Falcon services to ensure data is processed and replicated reliably. Using Falcon you speed app development with greater overall quality.
The Power of Community
Unwavering belief in the power of community-driven open source software is the cornerstone of Hortonworks’ approach. As we discussed in “The Road Ahead for Hortonworks and Hadoop”, one of the key areas of investment for enterprise Hadoop this year are features that address the business continuity and data governance needs of the mainstream enterprise.
New but proven
The team at InMobi (who have been significant contributors to Apache Hadoop since their inception) couldn’t agree more, which is why they built Falcon for their own usage almost 18 months ago. And now, having proved it successfully at scale in their production environment for more than 12 months and managing hundreds of data feeds into and out of Hadoop, this technology has now been contributed to the Apache Software Foundation so that the entire community may benefit.
We are thrilled to welcome Falcon as yet another example of the relentless march of innovation that is community driven, open source Apache Hadoop, and hope you join us on the journey.