Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
September 22, 2014
prev slideNext slide

Real World Examples: Real-time Data From the Internet of Things (IoT)

Internet of Things (IoT) Potential and Process

It may seem obvious (or inevitable), but many companies are embracing the Internet of Things (IoT)—and for good reasons, notes Forbes’ Mike Kavis. For one, McKinsey Global Institute reports that IoT business will reach $6.2 trillion in revenue by 2025. And second, more and more objects are becoming embedded with sensors that communicate real-time data to data centers’ networks for processing, explain McKinsey’s Chui, Loffler, and Roberts.

While both reasons may be true, what makes IoT possible, besides ubiquitous embedded sensors, is the sensors’ ability to transmit digestible data in real-time and Hadoop 2 clusters’ capacity to absorb and process voluminous data at petabyte scale. At the heart of the processing voluminous sensor data at scale are three major steps:

  1. Data Ingestion (or harvesting)
  2. Data Storage (or persisting)
  3. Data Analytics (or deriving value)

iot

These three steps are possible because today’s Modern Data Architecture (MDA), powered by Apache Hadoop YARN as its architectural center, allows multi-purpose data processing engines accessing and transforming the same data workloads residing within the same cluster.

In this blog, we briefly introduce three tutorials for the Sandbox, written by Saptak Sen of Hortonworks. They employ two complementary component technologies—Apache Kafka and Apache Storm, both running on Hortonworks Data Platform (HDP) and both essential components for handling sensor data at scale.

Of the three major steps outlined above for IoT, these two components exemplify the first step: data ingestion. We will explore data storage and data analytics in subsequent tutorials and respective blogs.

These tutorials illustrate how Kafka and Storm capture, ingest, and process sensor data, combined with geo-location from sensors in trucks, with real-time events like speeding, lane-departure, and unsafe tailgating. They enable, facilitate, and demonstrate how real-time data processing can be achieved in a Hadoop cluster.

Realtime Data Production and Ingestion

Data must originate from somewhere. For example, an embedded sensor can produce data at frequent intervals. A consumer can fetch it from a live data stream or read from a committed log file. In both cases, for each datum, there is a producer and a consumer. This produce-and-consume paradigm is at the core of any messaging system. Apache Kafka is a publish-subscribe messaging system designed for distributed commit log. Kafka allows producers to ingest data into it—and consumers to read from it.

Realtime Data Ingestion Tutorials

In the first tutorial, we show how you can use Apache Kafka as a producer of trucking events.

Whereas the first tutorial shows how to produce Kafka truck events, the second tutorial demonstrates how to capture and consume these truck data events in realtime with an Apache Storm cluster.

Finally, no data processing tutorial in Hadoop cluster can escape the putative WordCount example. In that tradition, this third tutorial shows how to process and count words in real-time using Apache Storm.

Discover and Learn More

  • Try other real world examples on Hortonworks Sandbox
  • Download Hortonworks Sandbox
Tags:

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>