Apache Storm

A system for processing streaming data in real time

Apache™ Storm is a distributed real-time computation system for processing fast, large streams of data. Storm adds reliable real-time data processing capabilities to Apache Hadoop® 2.x. Storm in Hadoop helps capture new business opportunities with low-latency dashboards, security alerts, and operational enhancements integrated with other applications running in their Hadoop cluster.

What Storm Does

Now with Storm and MapReduce running together in Hadoop on YARN, a Hadoop cluster can efficiently process a full range of workloads from real-time to interactive to batch. Storm is simple and developers can write Storm topologies using any programming language.

Five characteristics make Storm ideal for real-time data processing workloads. Storm is:

  • Fast – benchmarked as processing one million 100 byte messages per second per node
  • Scalable – with parallel calculations that run across a cluster of machines
  • Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node.
  • Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures.
  • Easy to operate – standard configurations are suitable for production on day one. Once deployed, Storm is easy to operate.

Enterprises use Storm to prevent certain outcomes or to optimize their objectives. Here are some “prevent” and “optimize” use cases.

“Prevent” Use Cases “Optimize” Use Cases
 Financial Services  
  • Securities Fraud
  • Compliance Violations
  • Order Routing
  • Pricing
  • Security Breaches
  • Network Outages
  • Bandwidth Allocation
  • Customer Service
  • Shrinkage
  • Stock outs
  • Offers
  • Pricing
  • Machine Failures
  • Quality Assurance
  • Supply Chain
  • Continuous Improvement
  • Driver Monitoring
  • Predictive Maintenance
  • Routes
  • Pricing
  • Application Failures
  • Operational Issues
  • Personalized Content

How Storm Works

A storm cluster has three sets of nodes:

  • Nimbus node (master node, similar to the Hadoop JobTracker):
    • Uploads computations for execution
    • Distributes code across the cluster
    • Launches workers across the cluster
    • Monitors computation and reallocates workers as needed
  • ZooKeeper nodes – coordinates the Storm cluster
  • Supervisor nodes – communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus


Five key abstractions help to understand how Storm processes data:

  • Tuples– an ordered list of elements. For example, a “4-tuple” might be (7, 1, 3, 7)
  • Streams – an unbounded sequence of tuples.
  • Spouts –sources of streams in a computation (e.g. a Twitter API)
  • Bolts – process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases.
  • Topologies – the overall calculation, represented visually as a network of spouts and bolts (as in the following diagram)


Storm users define topologies for how to process the data when it comes streaming in from the spout. When the data comes in, it is processed and the results are passed into Hadoop.

Learn more about how the community is working to integrate Storm with Hadoop and improve its readiness for the enterprise.

Try these Tutorials

Apache Top-Level Project Since
September 2014
Hortonworks Committers

Try Storm with Sandbox

Hortonworks Sandbox is a self-contained virtual machine with HDP running alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

Discover HDP 2.2: Apache Storm and Apache Kafka for Stream Data Processing
Thursday, December 4, 2014
1:00 PM Eastern / 12:00 PM Central / 11:00 AM Mountain / 10:00 AM Pacific

More Webinars »


More posts on:
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.