Apache Falcon Technical Preview Available Now

A framework for simplifying data management and pipeline processing

falcon-logoWe believe the fastest path to innovation is the open community and we work hard to help deliver this innovation from the community to the enterprise.  However, this is a two way street. We are also hearing very distinct requirements being voiced by the broad enterprise as they integrate Hadoop into their data architecture.

Take a look at the Falcon Technical Preview and the Data Management Labs.

Open Source, Open Community & An Open Roadmap for Dataset Management

Over the past year, a set of enterprise requirements has emerged for dataset management.  Organizations need to process and move datasets (whether HDFS files or Hive Tables) in, around and between a clusters. This task can start innocently enough but usually (and quickly) becomes very complex. Dataset locations, cluster interfaces, replication and retention policies can all change over time. And hand-coding this logic into your applications — along with general retry and late data arrival logic — can become a slippery-slope of complexity. Getting it right the first time can be a challenge but maintaining the end result can be downright impossible.

To meet these requirements, we will, as always, work within the community to deliver them and we have introduced a Hortonworks Labs initiative to make dataset management easier. This initiative outlines a public roadmap that will deliver the features that will help Hadoop users avoid the complexity of processing and managing datasets. Much of the work is outlined in Apache Falcon which provides a declarative framework for describing data pipelines to simplify the development of processing solutions. By using Falcon, users can describe dataset processing pipelines in a way that maximizes reuse and consistency, while insulating from the implementation details across datasets and clusters.

We invite you to review and follow the roadmap in our Labs area and also encourage you to get involved in the community.

If you want get started today using some of these tools, we have made a Falcon Technical Preview available.

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.