Apache Storm and Hadoop

An Update on Storm v0.9.1

storm1In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data. Hortonworks is already supporting customers using this important project today.

Many organizations have already used Storm, including our partner Yahoo! This version of Apache Storm (version 0.9.1) is:

  • Highly scalable. Like Hadoop, Storm scales linearly
  • Fault-tolerant. Automatically reassigns tasks if a node fails
  • Reliable. Supports “at least once” and “exactly once” processing semantics
  • Language agnostic. Processing logic can be defined in any language (e.g. Ruby, Python, Javascript, Perl) and,
  • It is an Apache Project. Which brings with it the brand, governance and large community of the Apache Software Foundation.

Netty-based Messaging Transport

The biggest code change in version 0.9.1 was the removal of the 0MQ transport in favor of a pure java Netty-based transport. Special thanks to the engineering team at Yahoo! for contributing that.

Previously, installing the 0MQ native binaries proved difficult for many users.  The pure-java solution cures that headache. Netty also improves Storm’s performance over 0MQ, allowing twice as many messages per second through the same cluster.

All this being said, the 0MQ transport is still an available and supported option for those who want to use it.

Windows Platform Support

This is the first release of Storm with built-in Windows support. This is an important step for those who have invested in a Windows-based infrastructure and want to use Storm for real-time, stream processing.

Hortonworks Data Platform is the only Hadoop distribution that supports Windows. Now that Storm is part of HDP, it will also run on Windows.

Apache Maven for Storm Builds

From a developer perspective, we migrated from using Leiningen as our build tool to using Apache Maven. This was the right thing to do for release management. Maven had more options when it came to integrating Storm’s build process with the ASF release infrastructure.

Now we’re in a much better position to release early and often.

Coming Next: Security, Multi-tenancy and Storm-On-YARN

Now that we have our first Apache release out, we’re in a better position to work on what matters most to our users: improving Storm and adding new features.

A focus for upcoming releases will be security and multi-tenancy. The engineering team at Yahoo! has contributed a tremendous amount of work in that regard, and we’ll be looking to get those features added to the main codebase.

There is also a lot of interest in support for running Storm on YARN. Again, Yahoo! has done a lot of work in this area, and has open-sourced a preliminary implementation of Storm on YARN.

Storm Comes to the Apache Software Foundation (ASF)

This is Storm’s first release from the Apache Software Foundation. The ASF ensures that released software adheres to a stringent set of licensing and distribution rules that protect both the users of the software and the contributing developers.

Thanks to the Team

Many people worked hard to bring Storm into the ASF and to release version 0.9.1. Thanks to the following folks who made this release possible: Andy Feng; David Lao; Derek Dagit; Flip Kromer; James Xu; Jason Jackson; Nathan Marz and Robert Evans.

DOWNLOADS: http://storm.incubator.apache.org/downloads.html

RELEASE NOTES: https://git-wip-us.apache.org/repos/asf?p=incubator-storm.git;a=blob_plain;f=CHANGELOG.md;hb=254ec135b9a67b1e7bc8e979356274aee2e7d715

Categorized by :
Architect & CIO Developer HDP 2 Storm Stream Data

Comments

murat
|
April 30, 2014 at 6:36 am
|

Hi,how to install apache storm on windows

murat
|
April 30, 2014 at 6:35 am
|

Hi,how to install Apache storm on windows?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Office Hours
Thursday, September 11, 2014
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.