Welcome to a three part tutorial series on real-time data ingesting and analysis. The speed of today’s processing systems have moved from classical data warehousing batch reporting to the realm of real-time processing and analytics. The result is real-time business intelligence. Real-time means near to zero latency and access to information whenever it is required. This tutorial will show how geolocation information from trucks can be combined with sensor data from trucks and roads. These sensors report real-time events like speeding, lane-departure, unsafe tailgating, and unsafe following distances. We will capture these events in real-time.
- Downloaded and Installed latest Hortonworks Sandbox
- Learning the Ropes of the Hortonworks Sandbox
- 8GB+ RAM (Assigning more is recommended) and preferably 4 processor cores, otherwise you may encounter errors in the third tutorial
- Data sets used:
- New York City Truck Routes from NYC DOT.
- Truck Events Data generated using a custom simulator.
- Weather Data, collected using APIs from Forcast.io.
- Traffic Data, collected using APIs from MapQuest.
All data sets used in these tutorials are real data sets but modified to fit these use cases
The events generated by sensors will be ingested and routed by Apache NiFi, captured through a distributed publish-subscribe messaging system named Apache Kafka. We will use Apache Storm to process this data from Kafka and eventually persist that data into HDFS and HBase.
- Understand Real-time Data Analysis
- Understand Apache NiFi Architecture
- Create NiFi DataFlow
- Understand Apache Kafka Architecture
- Create Consumers in Kafka
- Understand Apache Storm Architecture
- Create Spouts and Bolts in Storm
- Persist data from Storm into Hive and HBase
- Concepts – foundation of technologies
- Turorial 0 – Simulator, Apache Services and IDE Environment
- Tutorial 1 – Apache NiFi: Ingest, Filter and Land Real-Time Event Stream
- Tutorial 2 – Apache Kafka: Real-time event stream transportation
- Tutorial 3 – Ingest Real-Time Data into HBase & Hive using Storm