cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 05, 2016
prev slideNext slide

Windowing and State checkpointing in Apache Storm

Apache Storm is the scalable, fault-tolerant realtime distributed processing engine that allows you to handle massive streams of data in realtime, in parallel, and at scale.

Windowing computations is one of the most common use cases in stream processing. Support for windowing computations is a must for deriving actionable insights from real time data streams. So far Apache Storm relied on developers to built their own windowing logic and there were no high level abstractions for developers to define a Window in a standard way in a Storm Topology. Support for doing stateful computation (i.e save the state of a bolt’s computation) was also limited and developers had to write custom logic in bolts or rely on Trident apis.

In this blog we cover,

  • Overview of windowing, specifically time and count based sliding and tumbling windows and the internals of the windowing support that we recently added in Apache Storm.
  • The windowing apis and how to build Storm topologies for doing windowing computations.
  • Event-time (vs processing time) based windows using the concept of watermarks.
  • Stateful computations in core Storm with the recently added support for State and the internals of the state checkpointing mechanism.
  • How stateful computation can be used to prune duplicate window evaluations.

Apache Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. With the addition of windowing and stateful processing capabilities, Apache Storm could be used in even more real time streaming use cases.

Click this link to continue reading

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *