During the 2nd demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, Simon Ball demonstrated how to take data received from the edge, and run facial recognition on a more powerful cloud based cluster with Apache NiFi running in Azure to collect data, Kafka (substrate across all the analytics) all running on Azure, with Spark pieces on top of YARN, with Zeppelin on top.
Apache NiFi provides real-time edge analytics for basic facial recognition. But sometimes you need more powerful computer vision machine learning
Edge processing, has limited power and processing that only allows you to do some basic facial recognition. Using basic facial recognition, Apache NiFi allows you to prioritize which images are more important than others. Then, with Apache NiFI’s site to site protocol, prioritized images are transferred first, along with the meta data from the bar codes on the badges. From there, on the cluster that has received the prioritized images, we use Spark and Zeppelin, together with an additional library, dlib, which specializes in computer vision machine learning.
In a cluster running on the cloud, with Spark’s machine learning capability and it’s ability to parallelize across very large datasets, one can do more sophisticated analytics. For example, one can compare and correlate data against an entire customer database which is not practical to store on a Raspberry Pi edge device in a store. We can also do things like facial alignment and take advantage of Spark’s built in support for numpy, and Spark’s ability to crunch large number of matrices, then we can start to identify facial landmarks and alignment. We can then take facial landmark vectors and pass these into classifiers that can be trained in Spark, and start to compare with reference photos, identify facial vectors, and then the system can start to tell you names solely based on images (without needing the bar code information used earlier).
That was the 2st demo of Data Hacks & Demos at Hadoop Summit San Jose. The 3rd demo – using IoT to get real-time feedback is up next in this blog series. In the meantime, to get started with building something like this yourself, check out these links: