Apache NiFi is the first integrated platform that solves the real-time challenges of collecting and transporting data from a multitude of sources and provides interactive command and control of live flows with full and automated data provenance. NiFi provides the data acquisition, simple event processing, transport and delivery mechanism designed to accommodate the diverse dataflows generated by a world of connected people, systems, and things.
For the purposes of this tutorial, assume that a city planning board is evaluating the need for a new highway. This decision is dependent on current traffic patterns, particularly as other roadwork initiatives are under way. Integrating live data poses a problem because traffic analysis has traditionally been done using historical, aggregated traffic counts. To improve traffic analysis, the city planner wants to leverage real-time data to get a deeper understanding of traffic patterns. NiFi was selected for this real-time data integration.
The goal of this tutorial is to provide you with an opportunity to interact with Apache NiFi features while building a dataflow. You do not need programming experience or flow-based programming syntax and feature knowledge to successfully complete this tutorial.
The learning objectives of this tutorial are to:
- Understand Apache NiFi fundamentals
- Introduce NiFi’s HTML user interface
- Introduce NiFi processor configuration, relationships, data provenance, and documentation
- Create dataflows
- Incorporate APIs into a NiFi dataflow
- Learn about NiFi templates
- Create Process Groups
- Downloaded and installed HDF Sandbox for VMware, VirtualBox or Native Docker
- For Windows 10 users, use Ubuntu bash shell
In this tutorial, we work with San Francisco MUNI Transit agency data, gathered from NextBus XML Live Feed, handling vehicle locations, speeds, and other variables.
The tutorial consists of four sections:
- Tutorial 0 – Launch your NiFi HTML User Interface (UI). Get NiFi up and running on Hortonworks DataFlow Sandbox.
- Tutorial 1 – Open NiFi UI and explore its features. Create a dataflow by adding and configuring eleven processors. Ingest data from a transit location XML simulator, extract transit location detail attributes from flowfiles, and route attributes to a converted JSON file. Run the dataflow and verify the results in a terminal.
- Tutorial 2 – Add geographic location enrichment to the dataflow; incorporate Google Places Nearby API into the dataflow to retrieve places near the vehicle’s location.
- Tutorial 3 – Ingest NextBus’s live stream data for San Francisco MUNI agency.
Each tutorial provides step by step instructions, so that you can complete the learning objectives and tasks associate with it. You are also provided with a dataflow template for each tutorial that you can use for verification. Each tutorial builds on the previous.