Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
December 06, 2013
prev slideNext slide

Apache Falcon Technical Preview Available Now

falcon-logoWe believe the fastest path to innovation is the open community and we work hard to help deliver this innovation from the community to the enterprise.  However, this is a two way street. We are also hearing very distinct requirements being voiced by the broad enterprise as they integrate Hadoop into their data architecture.

Take a look at the Falcon Technical Preview and the Data Management Labs.

Open Source, Open Community & An Open Roadmap for Dataset Management

Over the past year, a set of enterprise requirements has emerged for dataset management.  Organizations need to process and move datasets (whether HDFS files or Hive Tables) in, around and between a clusters. This task can start innocently enough but usually (and quickly) becomes very complex. Dataset locations, cluster interfaces, replication and retention policies can all change over time. And hand-coding this logic into your applications — along with general retry and late data arrival logic — can become a slippery-slope of complexity. Getting it right the first time can be a challenge but maintaining the end result can be downright impossible.

To meet these requirements, we will, as always, work within the community to deliver them and we have introduced a Hortonworks Labs initiative to make dataset management easier. This initiative outlines a public roadmap that will deliver the features that will help Hadoop users avoid the complexity of processing and managing datasets. Much of the work is outlined in Apache Falcon which provides a declarative framework for describing data pipelines to simplify the development of processing solutions. By using Falcon, users can describe dataset processing pipelines in a way that maximizes reuse and consistency, while insulating from the implementation details across datasets and clusters.

We invite you to review and follow the roadmap in our Labs area and also encourage you to get involved in the community.

If you want get started today using some of these tools, we have made a Falcon Technical Preview available.


Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums