Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.
For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform. The code for the demo resides on a GitHub repository. If you want to get started with Storm on YARN and HDP 2 Beta – check out out guide here.
As part of a prototype we built for a customer we are processing data coming from a fleet of vehicles with each vehicle has been instrumented to report back its position and events in real time. The customer requirement was to receive an alert when certain kinds of events exceed a threshold. The customer then wanted to know when a driver was having problems so they could intervene.
In this example, events are being streamed in real time. The solution topology is very simple and consisted of one data source or spout and a single data consumer called a bolt.
The spout interfaced Storm to the streaming data source in this case it was a JMS queue. It took care of capturing the data and formatting it in to what Storm calls tuples. Tuples are objects that contain the fields collected for each event. The tuples are passed to our bolt. The bolt is very simple. It kept track of all the drivers it saw and used a hash table to track events. When the threshold was exceeded an email message was generated that went to dispatch.
For the purposes of the demo we created a event generator to simulate the driving events. It was coded with a fixed set of “bad” drivers.
Here is what things look like when we start up Storm:
We first execute the jar file that contains our topology. A topology lives forever so it will process events until you kill it.
Our topology is now running. You can see the spouts are open and ready to receive events.
Now we start up our event generator. You can see the events being generated.
We can monitor events in the Storm log files and we see events coming in
As the rule threshold is being exceeded we then receive the emails. We did it with emails alerts for this demo but in a real world implementation we could tie into almost any kind of monitoring infrastructure.
We would like you to be able look at what we did and try it yourself. Our infrastructure is very simple. We used our Sandbox VM which is a full implementation of the Hortonworks Data Platform. The code for the demo resides on a GitHub repository
The code is structured to work with the Maven build infrastructure. So once you clone the repository you can quickly build the jars.
The Storm infrastructure is available at their project page at http://storm-project.net/. The latest build downloads are available as well as instructions on how to build a cluster.
Have fun streaming!
If you want to get started with Storm on YARN and HDP 2 Beta – check out out guide here.