With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed these engines, Apache Falcon is a crucial framework for simplifying data management and pipeline processing.
Falcon enables data architects to automate the movement and processing of datasets for ingest, pipeline, disaster recovery and data retention use cases.
We recently released Apache Falcon 0.6.0. With this release, the community addressed more than 220 JIRA issues. Among these many bug fixes, improvements and new features, four stand out as particularly important:
Now Apache Falcon supports an access control list (ACL) that provides authorization for Feed, Cluster and Process entities. This allows Falcon to leverage existing security work to maintain consist controls throughout the HDP stack. This security enhancement lays the foundation for broader enterprise adoption and a variety of new use cases that will flow from that.
This Falcon release provides better access to lineage metadata. This facilitates the quick and efficient search and retrieval of lineage information, which makes it easier to comply with data retention and discoverability regulations.
Allows leveraging of Cloud infrastructure such as Amazon S3 and Microsoft Azure. We’re excited about this change because it extends the archive use case for continuity and ad hoc analysis.
A Falcon recipe is a static process template with parameterized workflow to realize a specific use case. Recipes are defined in the user space. All recipes can be modeled as a Process within Falcon, which then periodically executes the user workflow. As the process and its associated workflow are parameterized, the user will provide a properties file with name/value pairs that are substituted by Falcon before scheduling. Falcon translates these recipes as a process entity by replacing the parameters in the workflow definition. Recipes enable non-programmers to capture and re-use very complex business logic.
We want to thank the Apache Falcon community for all of its hard work delivering this release. Looking forward to future releases, the Apache Falcon team plans: