Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
September 28, 2015
prev slideNext slide

Unique Big Data Needs of the IoAT

Today, I’m excited to share that we have released the GA version of Hortonworks DataFlow (HDF), a new offering that directly addresses the unique big data needs of the Internet of Anything (IoAT). Hortonworks DataFlow is powered by Apache Nifi a top-level open source project made available through the NSA Technology Transfer Program.

By making this technology a commercial offering, we now provide our customers the ability to connect, collect and curate data from a broad spectrum of connected yet disparate data sources – sensors, machines, geo-location devices, social feeds, connected cars, web clicks, server logs and more. These are forming the Internet of Anything (IoAT) and have the potential to create unprecedented opportunities for business innovation, optimization and efficiency.

To realize the potential from the massive and dynamically changing data stream generated by IoAT systems must be able to ingest and process this information in a timely fashion – before its value perishes. To derive value and insights from data in motion from the IoAT must be treated as dataflows—from source to destination—so that modern analytical applications can collect, conduct and curate the data in a secure, scalable and reliable manner.

Consider the case of the connected car or perhaps a fleet of trucks as a microcosm of the IoAT. Each vehicle generates a constantly updated data feed about location, velocity, engine performance, local weather, music preferences and more. Consistently and reliably collecting this data in a timely manner supports two-way communications to be conducted to/from the vehicle about preventative maintenance, location based offers and decisions about localized, real time traffic redirection and more. Of key importance is that the data collection and consequent decision making must be within a practically useful timeframe – after all, finding out about inclement weather conditions causing a traffic jam after being stuck in it for 10 minutes isn’t very valuable. This is the nature of a perishable insight.

At the same time, physical conditions can cause connectivity to fluctuate; determining whether a single “engine maintenance” signal is simply a temporary warning signal for preventative maintenance or if it was the last critical signal before a vehicle became impaired is crucial for accurate decision making. The ability to curate the dataflow; zooming in to relevant and pertinent signals over the last 48 hours in a secure, scalable and reliable manner is necessary to provide the correct context to determine the appropriate course of action.

Addressing these three problems of a) collecting vast amounts of data from disparate, distributed data sources, b) timely processing and distilling this data into relevant and meaningful signals, and c) reliable, bi-directional communication to and from the jagged edge are all critical to the accurate decision making and dynamic responses that support the transformative opportunities of the Internet of Anything.

Hortonworks DataFlow, powered by Apache NiFi, supports the IoAT with the following key capabilities:

  • Collect any and all IoAT data from dynamically changing, disparate and physically distributed sensors, machines, geo location devices, clickstreams, files, and social feeds via a highly secure lightweight agent.
  • Reliably conduct secure point-to-point and bidirectional data flows in real time
  • Curate data via tracing, parsing, filtering, joining, transforming, forking or cloning of dataflows to generate holistic context and support appropriate responses.

These capabilities are designed to address the unique requirements of the Internet of Anything, and enables data stewards to construct secure and reliable data grids as continuous dataflows for on time processing—from anything, from anywhere—at scale. As a result, Hortonworks customers can now securely and easily collect, conduct and curate any type of “data-in-motion” with HDF as well as view leverage traditional data at rest with HDP, which together can now be blended to provide both historical and perishable insights.

Data in motion by Hortonworks DataFlow

To find out more about how Hortonworks DataFlow is complementary to the Hortonworks Data Platform powered by Apache Hadoop and supports collection of data-in-motion from a wide array of disparate data sources click here.

About the Author

MattMorganMatthew Morgan is the vice president of product and alliance marketing for Hortonworks. In this role, he leads Hortonworks product marketing, alliance marketing, vertical solutions marketing, and worldwide sales enablement. His background includes twenty years in enterprise software, including leading worldwide product marketing organizations for Citrix, HP Software, Mercury Interactive, and Blueprint. Feel free to connect with him on LinkedIn or visit his personal blog



Leave a Reply

Your email address will not be published. Required fields are marked *