Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
HDF > Develop with Hadoop > Real World Examples

Realtime Event Processing in Hadoop with NiFi, Kafka and Storm

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

Welcome to a three part tutorial series on real-time data ingesting and analysis.  The speed of today’s processing systems have moved from classical data warehousing batch reporting to the realm of real-time processing and analytics. The result is real-time business intelligence. Real-time means near to zero latency and access to information whenever it is required. This tutorial will show how geolocation information from trucks can be combined with sensor data from trucks and roads.  These sensors report real-time events like speeding, lane-departure, unsafe tailgating, and unsafe following distances. We will capture these events in real-time.

Prerequisites

  • Downloaded and Installed latest Hortonworks Sandbox
  • Learning the Ropes of the Hortonworks Sandbox
  • 8GB+ RAM (Assigning more is recommended) and preferably 4 processor cores, otherwise you may encounter errors in the third tutorial
  • Data sets used:
  • New York City Truck Routes from NYC DOT.
  • Truck Events Data generated using a custom simulator.
  • Weather Data, collected using APIs from Forcast.io.
  • Traffic Data, collected using APIs from MapQuest.

All data sets used in these tutorials are real data sets but modified to fit these use cases

Tutorial Overview

The events generated by sensors will be ingested and routed by Apache NiFi, captured through a distributed publish-subscribe messaging system named Apache Kafka. We will use Apache Storm to process this data from Kafka and eventually persist that data into HDFS and HBase.

Goals of the tutorial

  • Understand Real-time Data Analysis
  • Understand Apache NiFi Architecture
  • Create NiFi DataFlow
  • Understand Apache Kafka Architecture
  • Create Consumers in Kafka
  • Understand Apache Storm Architecture
  • Create Spouts and Bolts in Storm
  • Persist data from Storm into Hive and HBase

Outline

  1. Concepts – foundation of technologies
  2. Turorial 0 – Simulator, Apache Services and IDE Environment
  3. Tutorial 1 – Apache NiFi: Ingest, Filter and Land Real-Time Event Stream
  4. Tutorial 2 – Apache Kafka: Real-time event stream transportation
  5. Tutorial 3 – Ingest Real-Time Data into HBase & Hive using Storm