Storm, Stream Data Processing

Bringing Stream Data Processing to Hortonworks Data Platform

YARN opened up Hadoop for data access by applications other than MapReduce. One of the most commonly demanded use cases was the antithesis of batch: stream processing in Hadoop. Apache Storm is a fully certified component of HDP 2.1, and our customers are using stream processing for real-time analysis of some of the most common new types of data such as sensor and machine data.

Initiative Goals

Streams in HDP
Bringing stream data processing to enterprise Apache Hadoop and Hortonworks Data Platform.
Storm on YARN
Use the YARN Hadoop operating system to allow multiple workloads to be applied to Hadoop data simutaneously.
Enterprise Ready
Bring baseline high availability, management, authentication and advanced scheduling to Storm.

Status

The team at BackType/Twitter originally conceived Storm to analyze the tweet stream in real time. Storm became an official Apache incubation project in September 2013. Hortonworks engineering is deeply committed to integrate Storm with Hadoop.

Beginning with Hortonworks Data Platform version 2.1, Apache Storm is a fully-certified component of HDP. The current version of Storm replaces 0MQ data transport with pure Java netty-based transport, and eliminates the challenge of installing the 0MQ native binaries. Storm 0.9.1 also includes built-in support for Windows.

Find more discussion here…

Essential Timeline

Streaming IN Hadoop
  • Install, Start & Stop via Ambari
  • Kafka, HBase & HDFS Connectors
  • Ganglia & Nagios Monitoring
Preview AvailableStorm 0.91(HDP 2.1)
Enterprise Connectivity
  • Storm-on-YARN
  • Ingest & Notification for JMS
  • Data Persistence: EDWs, RDBMS, Cassandra
Improved Multi-Tenancy
  • HA Management w/Ambari
  • AD/LDAP Authentication Plugin
  • Declarative “wiring”
  • Hive Update Support
  • Advanced Scheduler

Technical Resources

Resources

Recently in the Blog

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.

Thank you for subscribing!