Apache NiFi is the first integrated platform that solves the real-time challenges of collecting and transporting data from a multitude of sources and provides interactive command and control of live flows with full and automated data provenance. NiFi provides the data acquisition, simple event processing, transport and delivery mechanism designed to accommodate the diverse dataflows generated by a world of connected people, systems, and things.
For the purposes of this tutorial, assume that a city planning board is evaluating the need for a new highway. This decision is dependent on current traffic patterns, particularly as other roadwork initiatives are under way. Integrating live data poses a problem because traffic analysis has traditionally been done using historical, aggregated traffic counts. To improve traffic analysis, the city planner wants to leverage real-time data to get a deeper understanding of traffic patterns. NiFi was selected for this real-time data integration.
The goal of this tutorial is to provide you with an opportunity to interact with Apache NiFi features while building a dataflow. You do not need programming experience or flow-based programming syntax and feature knowledge to successfully complete this tutorial.
The learning objectives of this tutorial are to:
- Understand Apache NiFi fundamentals
- Introduce NiFi’s HTML user interface
- Introduce NiFi processor configuration, relationships, data provenance, and documentation
- Create dataflows
- Incorporate APIs into a NiFi dataflow
- Learn about NiFi templates
- Create Process Groups
- Downloaded HDF Sandbox for VMware, VirtualBox or Native Docker
- Installed and Deployed HDF Sandbox for VMWare, VirtualBox or Native Docker
- For Windows 10 users, use Ubuntu bash shell or Sandbox Web Shell Client
In this tutorial, we work with San Francisco MUNI Transit agency data, gathered from NextBus XML Live Feed, handling vehicle locations, speeds, and other variables.
The tutorial consists of seven sections:
- NiFi DataFlow Automation Concepts – Explore the fundamentals of Data Flow Management with NiFi: Core Concepts, Architecture, etc
- Launch NiFi HTML UI – Launch your NiFi HTML User Interface (UI). Get NiFi up and running on Hortonworks DataFlow Sandbox.
- Build a NiFi Process Group to Simulate NextBus API – Simulate the NextBus API live feed with a data seed and check the data generating from the simulator.
- Build a NiFi Process Group to Parse Transit Events – Parse the XML file for transit observations(vehicle location, speed, vehicle ID, etc).
- Build a NiFi Process Group to Validate the GeoEnriched Data – Integrate Google Places API to bring more meaningful geographic insights and validate them.
- Build a NiFi Process Group to Store Data As JSON – Convert XML to JSON data format and store into file on local file system.
- Ingest Live Vehicle Routes via NextBus API – Ingest NextBus’s live stream data for San Francisco MUNI agency.
Each tutorial provides step by step instructions, so that you can complete the learning objectives and tasks associate with it. You are also provided with a dataflow template for each tutorial that you can use for verification. Each tutorial builds on the previous.