cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
April 20, 2016
prev slideNext slide

Announcing Apache Storm 1.0.0

Introduction

The community recently announced the release of Apache Storm 1.0.0 Stable. This is a significant release that delivers several features that pertain to enterprise readiness, operational simplicity and ease of use by dramatically enhancing areas around performance, scalability, debug-abilty and manageability.

Highlights

Here are some of the highlights of features introduced in Storm 1.0 –

Storm1.0

  1. Improved Nimbus HA:   Allow multiple instances of the Nimbus service to run in a cluster and perform leader election when a Nimbus node fails such that Nimbus hosts can join or leave the cluster at any time. This prevents Nimbus from being a single point of failure allowing existing topologies that undergo failures to be automatically detected and recovered.
  2. Automatic Back Pressure Support – Provide backpressure support so that if a receiver component is unable to handle incoming data/tuples, then the sender component can throttle the input based on configurable high/low watermarks. This throttling can be done without enabling ACKing and in a manner that is implemented independently of the Spout APIs.
  3. Windowing and State Management – Windowing computations are one of the most common use cases in stream processing. Support for windowing computations is a must for deriving actionable insights from real time data streams. Storm 1.0 now offers support for sliding and tumbling windows based on time duration and/or event count. With the addition of state management to core storm in Storm 1.0, the framework automatically and periodically snapshots the state of the bolts across the topology in a consistent manner. There is a default in-memory based state implementation and also a Redis backed implementation that provides state persistence.
  4. New Storm Connectors – Storm 1.0 introduces support for Cassandra and MongoDB NoSQL stores and support for Elasticsearch and Solr Indexing and Search servers.
  5. Distributed Cache API – Storm 1.0 introduces a distributed cache API that allows for the sharing of files (BLOBs) among topologies. Files in the distributed cache can be updated at any time from the command line, alleviating the need to repackage and redeploy the entire topology when updates are made to bundled resource data. This improves start time performance.
  6. Pacemaker Storm Daemon- Zookeeper was long considered as a bottleneck for managing heartbeats from workers/supervisors that affects Storm scalability due to the high volume of writes from workers. Pacemaker is an optional Storm daemon designed to process heartbeats from workers and it functions as a simple in-memory key/value store for persisting heartbeats.
  7. Storm Kafka Spout using new Client APIs – In our experience with solving real world enterprise use cases for real time analysis and rendering of streaming data, Storm and Kafka go together like peanut butter and jelly! This combination of messaging and processing technologies enables stream processing at linear scale. Storm 1.0 introduces support for the Storm Kafka Spout using Kafka 0.9 consumer APIs.
  8. Resource Aware Scheduling – Resource Aware Scheduling in Storm targets the goal of increasing overall throughput by maximizing resource utilization while minimizing network latency. In Storm 1.0, Resource Aware Scheduling schedules topology tasks among workers to best meet CPU and memory requirements specified for individual topology components while future Storm releases will extend this resource awareness to minimize network latency as well.
  9. Storm Topology Event Inspector – Storm 1.0 introduces the ability to view tuples flowing through the topology along with the ability to turn on/off debug events without having to stop/restart the entire topology. The user can select a specific Spout or Bolt, specify a configurable number of events to view and see incoming events and outgoing events from that component.
  10. Storm Performance Improvements – Storm 1.0 includes several performance related enhancements. These enhancements in areas such as Storm ACK-ing as well as in core Storm have delivered significant performance improvements to Storm 1.0 over previous versions.

 

Acknowledgements

At Hortonworks we believe in Storm. We see that enterprise adoption is growing and Storm is solving real world use cases all the time helping businesses derive actionable insights from streaming data in real time. Thanks to the Hortonworks Storm development team and the Apache Storm Community for contributing to this release. Expect to hear more from us as our development efforts continue.

 

Tags:

Comments

  • HI,I have setup for kafkaSpout with Storm 0.9.5 ,1.0.0,1.0.2.I am seeing record processing very slow in 1.0.0,1.0.2 than 0.9.5. To process 100000 tuple Strings it took 2 min with 0.9.5 and 10 min in 1.0.0,1.0.2.I am consuming the string from Kafka 0.9.
    topology —- kafkaSpout(1) -> bolt(1).
    Please suggest.As we need to finalize the design and the version

  • Leave a Reply

    Your email address will not be published. Required fields are marked *