Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
April 21, 2016
prev slideNext slide

The Next Market Leaders Will Power Their Businesses from IoAT Data Sources

To compete in the age of IoAT, organizations are tapping into data sources from a network of physical objects to design new customer experiences. The companies that are furthest along are removing operational inefficiencies from their internal processes. They are using self learning algorithms and dynamic model deployment for predictive maintenance to accelerate success in this space. These innovations are resulting in connected customer experiences, higher revenues and driving the bottom line for shareholders. The organizations that are rapidly acting on IoAT data sources with security and proper analytics will become the next generation of IoAT business market leaders.

To be clear, we are talking about a significant amount of data to be leveraged for rapid innovations. As of 2013, there was an estimated 2.8 zettabytes of data across the cybersphere, and that is expected to grow to 44 zettabytes by 2020, with 85% of this data coming from information from the Internet of Things. (Source: In the modern big data landscape, companies are challenged by the difficulty to interact with all of these different sources.

Let’s take a minute to define what we mean by IoAT data. It includes any new data source generated from sensors and machines, server logs, clickstream web application servers, social media, as well as files and emails. Most of this data is tagged with a geographic identifier and processed at the jagged edge. Traditional solutions that reside in the data center were not constructed to handle the complexity and security of IoAT data. The Hortonworks Data Platform (HDP), powered by Apache Hadoop, is the perfect location for storing data at growing volumes and speeds for future analysis. During data ingestion into HDP, Data Scientists are beginning to deploy self-learning models to the network edge impacting business processes and user experiences across products and services.

As IoAT application components move outside the traditional comfort zone of secure data centers, the individuals managing operations will be faced with an entirely new set of challenges. These challenges come from connected devices using less reliable networks with inconsistent speeds, and different protocols, to name a few. It’s also worth noting that the challenges being faced from IoAT data are very different than those from ETL, data movement, and streaming technologies because they are generally one direction.

IoAT applications must be able to access, process, and act on Data-In-Motion and Data-At-Rest. It’s critical that we no longer think of only delivering data back to home base. And, it is a common prediction that the next big data explosion will involve machine to machine application communication and two way transfer of IoAT data. Expect machine learning algorithms, streaming analytics and sophisticated applications to continue to move closer and closer to the far edges of our networks. IoAT data movement, being multi-directional, will be analyzed in-flight between systems and devices. These modern applications demand trusted insights and full-fidelity from all data originating at the far edge, especially, as it flows to an Apache Hadoop Data Lake in real-time.

We can all relate to the evolution of the internet, starting with humans interacting with computer systems and web servers over the internet. Now, we must begin to work with machine to machine interactions. An IoAT machine or application can come in many shapes and sizes, from the simplest of sensors sending back a 1 or 0 (on or off messages) to a motor vehicle or jet airliner (yes – these are both machines considered potential Edge Nodes) generating enormous amounts of data daily. We can consider each machine or application component as a potential IoAT data source. These IoAT machines deliver data payloads in every shape, size and velocity. Looking at the growing amount of data that we are moving onto our networks, it is clear that a large mountain to climb lies ahead of us.

An interesting comparison is how the cell phone evolved over the last decade. Most everyone carries a smart phone. And a smart phone is really a mini computer with a mobile operating system. The access to hundreds of applications which can be downloaded and installed onto a smart phone transform it into a “far” edge device. This kind of evolution is rapidly being extended to most IoAT devices, including, network gateway routers. Here is an example of specifications for an Edge Node gateway router shipping today:

 Gateway Router:

Compute Dual-core 1.9GHz

Memory 8GB RAM

Storage 64GB SATA SSD

Operating System Options Linux, Windows Iot Core, Ubuntu Snappy

Just like the evolution of our cell phones, IoAT Edge Node gateway routers are rapidly evolving. They are being loaded with a CPU, memory, storage and a fully functioning operating system, capability of running current and next generation IoAT connection hub applications servicing a significant number of end point devices.

Conventional wisdom up until now has system designers filtering data coming from remote data sources and only delivering what is absolutely necessary and cost effective back to their secure data centers. This will still be the case for some modern applications moving forward, but there will also be plenty of times when we want to move all or a majority of the IoAT data to our data centers. So how are we going to accomplish enterprise data movement in this rapidly evolving market in a secure, reliable and guaranteed fashion? The answer is Hortonworks Data Flow (HDF) powered by Apache Nifi.

HDF, powered by Apache Nifi, is an enterprise class data in motion application platform which has been running in production over the last eight years starting with roots in the NSA. Nifi was donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program in November 2014. Onyara was formed from Nifi’s original team of engineers working on this project, and was then acquired by Hortonworks. HDF is built from the ground up to help organizations move data between geographically dispersed systems that acquire, process, analyze and store data. It is architected to run clustered in the data center and also extend out to the edge. HDF manages the complexity of IoAT systems and data movement across different geographic locations, legal and network domains. A key benefit of HDF is that it helps resolve differences in data connections, including, Protocol, Format, Schema, Priority, Size of Event, and Frequency of Event, Authorization access and Relevance of event.

As mentioned, HDF can run in a clustered mode, within your data center or on an Edge Node device. An HDF instance will require the following minimum specification:

Hortonworks Data Flow – Minimum Specification (Tiny):

Compute Quad-Core CPU

Memory 1 GB RAM

Storage Sufficient for Usage Case

Operating System Options Linux, additional OS’s coming soon

HDF instances also run in Small (Single-core CPU, 4-16GB RAM, 1 HD), Medium (Dual-core CPU, 4-16GB RAM, 6+HD), and Large (Quad-core CPU, 64GB-1TB RAM, 12+ HDs) configurations, including, clustering for scale within a data center. It’s important to note that outside of the data center, HDF is perfectly suited for IoAT infrastructures that vary greatly with respect to power, space, and cooling.

Taking a look outside the data center, its important to note that today’s Edge Node gateways are a perfect candidate for an HDF deployment. In addition, any application running at the edge, which provides a Restful API interface is also a great data source for HDF workflows. HDF is built for secure movement of IoAT data generated with different volumes, velocities and variety across networks carrying significant varying latency.

Figure 1: The Modern IoAT Application and Hortonworks
Figure 1: The Modern IoAT Application and Hortonworks

For the data architects and developers of modern applications, the following illustration provides a general build guideline. We have incorporated capabilities from both HDF and HDP to offer guidance to development teams building modern IoAT applications.

Figure 2: IOT Platform – Big Picture
Figure 2: IOT Platform – Big Picture

The bottom line to this article is that organizations wanting to retain market leadership positions must harness new IoAT data and rapidly analyze and react to it – whether its Data-In-Motion or Data-At-Rest. Hortonworks is the ideal partner and HDF combined with HDP are the ideal products to meet these innovative use cases.

Read more about Hortonworks Data Flow and Hortonworks Data Platform

To learn more about powering the future of of data, visit


Sujitha Sanku says:

A quick question why isn’t the picture Figure 2: IOT Platform – Big Picture, there is no intersection between the HDF and HDP platforms.
Can you please check on that?


Blood Lad Season 2 says:

If you want to know if Blood Lad Season 2 is happening or not? You’re at the right place. Here’s the latest update about the release date and plot.

Knights of Sidonia Season 3 says:

Knights of Sidonia Season 3 Release date and news update about our favorite anime tv show Sidonia no Kishi, Knights of Sidonia Season 3 based on Japanese Manga.

manasa says:

Nice article. Your collection is very helpful. Thanks for sharing.

Kavita Mevada says:
Your comment is awaiting moderation.

It has been a long time since the second season circulated as Netflix Original so it will be great to observe more scenes in Knights of Sidonia season 3.

Leave a Reply

Your email address will not be published. Required fields are marked *