Using Apache Storm with HDP – Demo and Code

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform. The code for the demo resides on a GitHub repositoryIf you want to get started with Storm on YARN and HDP 2 Beta - check out out guide here.

Using Storm and Hadoop to optimize logistics

sia

As part of a prototype we built for a customer we are processing data coming from a fleet of vehicles with each vehicle has been instrumented to report back its position and events in real time. The customer requirement was to receive an alert when certain kinds of events exceed a threshold.  The customer then wanted to know when a driver was having problems so they could intervene.

In this example, events are being streamed in real time. The solution topology is very simple and consisted of one data source or spout and a single data consumer called a bolt.

The spout interfaced Storm to the streaming data source in this case it was a JMS queue. It took care of capturing the data and formatting it in to what Storm calls tuples. Tuples are objects that contain the fields collected for each event.  The tuples are passed to our bolt. The bolt is very simple. It kept track of all the drivers it saw and used a hash table to track events. When the threshold was exceeded an email message was generated that went to dispatch.

Our demo

For the purposes of the demo we created a event generator to simulate the driving events. It was coded with a fixed set of “bad” drivers.

Here is what things look like when we start up Storm:

We first execute the jar file that contains our topology. A topology lives forever so it will process events until you kill it.

sia3

Our topology is now running. You can see the spouts are open and ready to receive events.

sia4

Now we start up our event generator. You can see the events being generated.

sia5

We can monitor events in the Storm log files and we see events coming in

sia6

As the rule threshold is being exceeded we then receive the emails. We did it with emails alerts for this demo but in a real world implementation we could tie into almost any kind of monitoring infrastructure.

sia7

Looking at the code and trying it yourself

We would like you to be able look at what we did and try it yourself.  Our infrastructure is very simple. We used our Sandbox VM which is a full implementation of the Hortonworks Data Platform. The code for the demo resides on a GitHub repository

The code is structured to work with the Maven build infrastructure. So once you clone the repository you can quickly build the jars.

The Storm infrastructure is available at their project page at http://storm-project.net/. The latest build downloads are available as well as instructions on how to build a cluster.

Have fun streaming!

If you want to get started with Storm on YARN and HDP 2 Beta – check out out guide here.

Categorized by :
Administrator Architect & CIO Developer HDP 2 Storm YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.

Thank you for subscribing!