Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
March 01, 2018
prev slideNext slide



Edge collection helps companies leverage near real-time analysis of IoT data. Apache NiFi enables ingestion, routing, transformation, and mediation of data with bidirectional communication and end-to-end data provenance. The MiNiFi C++ agent complements Apache NiFi. The MiNiFi C++ agent collects sensor data at the edge and executes across your IoT infrastructure. This is called edge intelligence.

By 2025, it is expected that 75 billion connected devices will be deployed worldwide [1]. This statistic demonstrates the need for edge collection and intelligence. The MiNiFi C++ agents supplies this in a lightweight and highly portable package. With so many devices and use cases in the market, we built a highly configurable agent to fulfill user requirements.

In this blog post, we’ll discuss the newly released MiNiFi C++ agent’s features and how it can support edge intelligence and collection. First, we’ll focus on the agent’s high-level requirements. We’ll explore what they mean for developers and deployment specialists. The requirements, while not a full list, lay the framework for our discussion. Following requirements, we’ll look at what an extension is and the new features they bring. Finally, we’ll look at the upcoming HDF 3.2 when command and control will integrate seamlessly for the full Data-in-motion vision.


HDF 3.1 features

Agent Requirements

Agent shall be lightweight and executable on a variety of systems

Devices have varying requirements. Thus, agents must be lightweight and portable. We accomplished this by writing C++ code that is portable across most Unix platforms. HDF 3.1 delivers features that further reduce agent size through build customization.

Each agent shall support the same functional paradigm as Apache NiFi and support bidirectional communication

An  important requirement of MiNiFi C++ is that it follow the same paradigm of Apache NiFi. The agent supplies the same provenance, persistence, and reliability guarantees.  The default agent also supports bidirectional communication with one or more Apache NiFi instances. Deployed devices must have the ability to extract meaningful data, such as a log, telemetry, or network data and send it reliably to your warehouse for analysis. While MiNiFi C++ lives in the flow management picture of HDF 3.1, it also supports stream processing as well, with the ability to publish and subscribe to Kafka and MQTT topics.

HDF 3.1 begins to deliver parity with Apache NiFi’s vast processor ecosystem. From stream processing to Expression Language, we have added several new processors that support the edge collection vision [3]. Use cases in your NiFi flows may require processing, data manipulation, and grooming. By fulfilling this requirement, MiNiFi C++ provides users with the ability to enable edge processing and data collection much more easily. Coupled with the requirement that the agent be lightweight and run on a variety of systems, we enabled developers and deployment specialists with the ability to easily move processing to your IoT infrastructure without the need to build and deploy custom software for each agent.

Agents shall be fully controllable and be a good steward of edge resources

In addition to being lightweight, MiNiFi C++ agents must also be a good steward of your ecosystem. That’s why HDF 3.1 introduces resource management as a cornerstone requirement. Agents can monitor resource utilization on edge devices. Our power manager service will monitor resource consumption within a flow and adjust as necessary. This leads to reduced power and resource consumption with forward deployed edge infrastructure devices.

New Extensions

HDF 3.1 brings several exciting new features that explore the possibilities of IoT [6]. New in MiNiFi C++ agent is the concept of extensions. Extensions reflect one or more features, processors, or controller services within the agent.  Agents can be built with any combination of extensions. This allows us to support a myriad of applications from devices without persistent storage or for those that have processing or memory limitations. With minimal options selected, consisting of our default set of processors and Site-To-Site (including -Os), we see a binary as small as 3.5 MB. The following list are new extensions within HDF 3.1 :


ExecuteScript allows execution of Python and Lua scripts to enable custom logic on flowfiles within a process session. This enables the scripting of recipes in the language of choice. The utility of this can range from alerting within edge devices to managing data flows. Python includes access to all environmental packages, while Lua contains a select list[3].


LibArchive is a collection of archive processors: FocusArchiveEntry, UnfocusArchiveEntry, and ManipulateArchive. FocusArchiveEntry allows developers to focus and manipulate contents within an archive. This focuses the contents of a flow file on a target entry until UnfocusArchiveEntry returns the archive to its original state. This allows real time manipulation of data within an archive. ManipulateArchive will remove, move, copy, or touch targets within an archive without the need for focus.


GetUSBCamera captures frames of video from USB cameras. USBCamera stores frames within the content of flowfiles. The following photo is frame output from GetUSBCamera processor.  I adjusted the FPS to capture two frames per second so I could have a large set of example photos. Despite capturing hundreds of frames I found only one publishable photo.

USB camera collection


PCAP is a packet capture processor that’s new within this release. This processor captures all interface packets when switch or router capture isn’t possible. Each resulting flow file will contain a configurable number of events. Events can be analyzed at the edge or streamed to a warehouse. Flowfile contents will contain full packet capture. Compression is available with LibArchive if capture is streamed off the device.

Kafka and MQTT

Kafka and MQTT represent new streaming capabilities. Both supply a publish and consume processor for each broker. Kafka is a data broker used in HDF 3.1’s vision for stream processing. Likewise, MQTT is a lightweight messaging protocol that was designed with the constraints of edge devices in mind. Facilitating these protocols supports HDF’s vision where MiNiFi C++ agents empower the device for edge intelligence and streaming the data to extract intelligence. Streaming processors increase the breadth of your compute engine.


Three processors – TFApplyGraph, TFConvertImageToTensor, and TFExtractTopLabels. TFApplyGraph applies a TensorFlow graph to the protobuf and running the model. The TFConvertImageToTensor converts a supplied image to the tensor protobuf. TFExtractTopLabels  will extract the top five labels by analyzing the output of the graph, retrieving the highest scores and their position of a tensor, corresponding to an inferred category. TensorFlow has been applied to GetUSBCamera to perform identification [7]. The linked article demonstrates using sensor input to identify objects using machine learning.

Looking to the future

The upcoming release, HDF 3.2, will deliver command and control capabilities to monitor your IoT infrastructure. The entire enterprise, from executives to deployment specialists will have access to information about agents and the data collected.  Developers and deployment engineers will have control over agents and will be able to gather data with the same provenance guarantees as Apache NiFi. Access to stream analytics from the edge offers customers ground truth of their data across an infrastructure.

As we move forward with the vision of edge intelligence we are highly cognizant of resource management. Future work will introduce network management within flows. This will be useful for deployed devices that have multiple radios such as WiFi and LTE. Selection of interfaces can be performed in real time with customer criteria. Coupled with expression language, selection will be dynamic.

HDF 3.1

HDF 3.1 delivered many exciting capabilities.  In this blog entry, we’ve explored the growth of IoT. We focused on the MiNiFi C++ requirements we addressed and delivered in HDF 3.1. These high-level requirements lead us to view MiNiFi C++ as a critical aspect of edge infrastructure. MiNiFi C++ agents are highly flexible. The agent is lightweight and portable across platforms.

Device agents must be good stewards of each of your systems. We evaluated MiNiFi C++ requirements for HDF 3.1 along with our new set of extensions. . Finally, we took a brief dive into what the future holds for MiNiFI C++ in HDF 3.2 and beyond. Our goal is to enable technical staff to develop and deploy flows to agents with ease. Controlling these flows in a way that’s fully aware will be paramount in upcoming releases.









Leave a Reply

Your email address will not be published. Required fields are marked *