Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 08, 2016
prev slideNext slide

Demo #2: Play-by-Play: Data Hacks & Demos @ #HS16SJ

Apache NiFi to prioritize which images should be sent to Spark in the cloud for computer vision machine learning

During the 2nd demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, Simon Ball demonstrated how to take data received from the edge, and run facial recognition on a more powerful cloud based cluster with Apache NiFi running in Azure to collect data, Kafka (substrate across all the analytics) all running on Azure, with Spark  pieces on top of YARN, with Zeppelin on top.

Demo 2 Facial Recognition Apache NiFi Spark Hack Demos Hortonworks HS16SJ

So what did Simon tell the audience?

Apache NiFi provides real-time edge analytics for basic facial recognition. But sometimes you need more powerful computer vision machine learning

Edge processing, has limited power and processing that only allows you to do some basic facial recognition. Using basic facial recognition, Apache NiFi  allows you to prioritize which images are more important than others. Then, with Apache NiFI’s site to site protocol, prioritized images are transferred first, along with the meta data from the bar codes on the badges. From there, on the cluster that has received the prioritized images, we use Spark and Zeppelin, together with an additional library, dlib, which specializes in computer vision machine learning.

How did facial recognition with Spark work?

In a cluster running on the cloud, with Spark’s machine learning capability and it’s ability to parallelize across very large datasets, one can do more sophisticated analytics. For example, one can compare and correlate data against an entire customer database which is not practical to store on a Raspberry Pi edge device in a store. We can also do things like facial alignment and take advantage of Spark’s built in support for numpy, and Spark’s ability to crunch large number of matrices, then we can start to identify facial landmarks and alignment. We can then take facial landmark vectors and pass these into classifiers that can be trained in Spark, and start to compare with reference photos, identify facial vectors, and then the system can start to tell you names solely based on images (without needing the bar code information used earlier).

That was the 2st demo of Data Hacks & Demos at Hadoop Summit San Jose. The 3rd demo  – using IoT to get real-time feedback is up next in this blog series. In the meantime, to get started with building something like this yourself, check out these links:


Leave a Reply

Your email address will not be published. Required fields are marked *