Streaming IN Hadoop: Yahoo! release Storm-YARN

Over the past year, customers have told us they want to store all their data in one place and interact with it in multiple ways… they want to use Hadoop, but in order to do so, it needs to extend beyond batch.  It also needs to be interactive and real-time (among others).

This is the entire principle behind YARN, which together with others in the community, Arun Murthy and the team at Hortonworks have been working on for more than 5 years!  The YARN based architecture of Hadoop 2.0 is hugely significant and we have been working closely with many partners to incorporate it into their applications.

Storm-YARN Released as Open Source

Yahoo! has been testing Hadoop 2 and its YARN-based architecture  for quite some time.  All the while they have worked on the convergence of the streaming framework Storm with Hadoop.  This work has resulted in a YARN based version of Storm that will radically improve performance and resource management for streaming.

We borrow from their blog post because they say it best…

Collocating real-time processing with batch processing offers a number of advantages over segregated clusters.

  • It provides a huge potential for elasticity. Real-time processing will rarely produce a constant and predictable load. As such, Storm needs more resources to keep up with spikes in demand. Collocating Storm with batch processing allows Storm to steal resources from batch jobs when needed and give them back when demand subsides. The Storm-YARN effort lays the groundwork to make this possible.
  • Many applications use Storm for low-latency processing and Map/Reduce for batch processing while sharing data between Storm and Map/Reduce. By placing Storm physically closer to the data source and/or other components in the same pipeline we can reduce network transfers and in turn the total cost of acquiring the data.

YARN as the basis of Hadoop 2.0 Architecture

We are excited about this development because it reinforces our approach of enabling the broader ecosystem of Hadoop based applications.  And that an open community is the fastest path to this innovation.  It is amazing to watch the pace of innovation that is occurring and we know we are still in the very early days of this evolution of technologies around Hadoop to meet the needs of the broad enterprise.

We are also excited about Storm-YARN as it is yet another application to move IN Hadoop.  Now we have SQL-IN-Hadoop for interactive queries with Stinger / Tez, Continuuity and WEAVE and now Storm-IN-Hadoop for streaming!  We look forward to a summer full of innovation around YARN.

Categorized by :
Apache Hadoop Hadoop Ecosystem YARN

Comments

|
June 13, 2013 at 2:04 pm
|

Hello !

Just a question ! What about a Spark + YARN ?

Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Thank you for subscribing!