Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 17, 2016
prev slideNext slide

Demo #4 & Summary: Play-by-Play: Data Hacks & Demos @ #HS16SJ

Streaming analytics to create an accurate single buyer identity in real-time

The 4th and final demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, was done by Simon Ball and it showcased how Apache NiFi moved parallel streams of streaming data into Spark and then more analysis could be done by combining Hortonworks Community Connection info and the Twitter data to create an accurate single buyer identity.

So what did Simon tell the audience?

Correlating reputation data from and linking this back to some of the twitter data. The HCC data is very clean and straight-forward,  but the twitter data is not so clean. To identify the “right customer”, use Spark to federate reputation data from HCC, and link queries across multiple data sets, and then from looking at the data extracted from twitter to bring it all together.  (BTW it also used Apache NiFi  to ensure rate-limiting of SMS services so as not to experience overages)

Once the data is pulled altogether, it is possible to start visualizing the data, and see duplicates on names – for example there are many Brad Anderson’s on Twitter – which one is the one we really care about?  Only the ones who are are interested in Hadoop are prospects of interest. So we put all this data into a Spark machine learning model to cluster the data, to identify communities within the data – and then we can identify who are the  “Hadoop people”, and the “non-Hadoop people.”

Then this data was pushed back out through a Storm topology which was processing incoming votes in real time, combining with the “Hadoop” people information and creating the data set of attendees who were eligible to win the light boxes!

One of the lucky winners was Scott Seligman, pictured here with Joe Witt –  the host of the entire Data Hacks and Demos session, Apache NiFi PMC Member and Senior Dir Engineering,  Hortonworks. )


Photo 30.06.16, 23 19 40


So in conclusion, in 20 minutes the combined team of Joe Witt, Jeremy Dyer, Kay Lerch and Simon Ball showcased how a brick and mortar retail store could identify which customers are walking in the door, greet them, interact with them and provide personalized offers in real time.  Are you ready to try this yourself? Here are some starting points.

Data Hacks Demos Hortonworks Hadoop Summit Apache NiFi


Leave a Reply

Your email address will not be published. Required fields are marked *