Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
November 28, 2016
prev slideNext slide

Enterprise NiFi: Implementing Reusable Components and a Software Development Lifecycle

Originally posted in HCC

1. Introduction

NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise readiness with Ambari management and Ranger security and multitenancy.

One of the hallmarks of NiFi is its awesome drag-and-drop UI which makes building and configuring dataflows drop dead easy. However, when using the same parts of a flow repeatedly across projects within a team or across the organization … the UI can slow development down by forcing the same manual steps to make the same pieces from scratch.

You can overcome this problem by using two features of NiFi — templates and configurations using Expression Language references — to build a library of reusable components that can be used as pre-built components to new flows. Doing so provides the following advantages to the team and the enterprise:

  • rapid development through component reuse
  • adoption of standards and patterns through component reuse
  • code that can be change-managed and implemented in an SDLC (Software Development Lifecycle) similar to any other software code, including promotion of the same code base across dev, test and production environments.

Note: for enterprise NiFi security, see these two valuable posts:

https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html
https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range.html

2. Overview of NiFi Reusable Component Technology

Reusable NiFi components center around a template and its configurations as shown below.

Template

  • templates are created from components made in the UI and saved to the NiFi environment (through the UI); they are shared between NiFi clusters by downloading to a local machine, sharing and uploading to a new NiFi environment (via UI or Restful API)
  • templates are XML
  • templates can be made from any subset of a flow: a single processor, a flattened subflow, or a process group holding a subflow
  • alternatively, templates can be made from full flows
  • templates can be uploaded to the UI and used as a starting point for a flow (all configs will be retained from the downloaded template and can now be changed or retained)
  • resulting flows from templates can themselves be downloaded as templates and deployed across environments (or implemented as reusable components in new flows)

Configurations

Configuration properties can be Expression Language (EL) references to

  • system properties
  • OS environment variables,
  • custom properties in written in a file

But note that:

  • EL references can only be used in Processor Properties
  • EL references can only be used in Processor Properties where “Supports expression language: true” (by clicking on question mark)
  • EL references have the form ${property.name}, e.g ${MYSQL_PWD} to reference a password SET in NiFi server operating system, or ${hdfs.zone.landing} to reference an HDFS path written in a custom property file.
  • EL references can concatenated, e.g. /data/${hdfs.zone.landing}


Configuration: OS Environment Variables

Export environment variable (best for sensitive values), e.g. export MYSQL_PWD=secretpwd

Configuration: Custom Property Files

To implement a file with custom property name=value pairs to be referenced by EL, do the following:

  1. On each NiFi server in your cluster, open the nifi.properties file
  2. For the field nifi.variable.registry.properties, set it to the path to your custom property file (or a csv of a list of custom property files)

Precedence for EL references

Properties referenced through the EL should have unique names. Properties that have the same names are given the following precedent (i.e. the value of the one found first in the below sequence is used):

Processor attribute -> Flow File attribute -> customer property file attribute -> system property -> OS environment variable

Also see:

See the following for more on using EL to reference properties.

https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html
https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html

3. Building a Reusable Component

Let’s say that I want to build the following component to be reused for many different flows.

My intent here is to read a file (GetFile) and do a simple transformation (ReplaceText) and output this to any processor in a flow I build in the future. I also want to push the original file content to HDFS for historical analysis. Putting to HDFS is not so simple: I first should MergeContent so HDFS files do not each hold a single line of code and I should UpdateAttribute so the filename in HDFS is the original name from GetFile with timestamp appended to distinguish files.  Let’s build this into a reusable component.

Step 1: Build in UI (including configurations)

Build the reusable component in your UI. Configure everything with either empty, hard-coded or EL values. (Best practice around this will be discussed later below).

Note that the template has processor errors. In the case of GetFile, the input directory is not configured. In the case of ReplaceText the Success relationship is not connected or auto-terminated. That is OK .. whoever uses this component will configure these later according to his or her specific flow.

Step 2: Save and Download Template

  1. Select processors and connections that will be your reusable component (select all or select each processor and connection separately).
  2. Save template.
  3. After you name it you can go to templates and see it listed as a member of your logged in user’s list of templates
  4. To share template with others, download from NiFi Templates list (icon to left of trash can). To delete from the list, click the trashcan

4. Using Reusable Component

Using a reusable component is the opposite of the above.

  1. If the component is not on your list NiFi Templates, get it to your local file system and upload it
  2. Grab the template icon, pull to your canvas and choose the template you want to add to the canvas     
    Clicking Add will add it to your canvas in the same way as adding a processor as you typically do. It will be added in the same state as it was downloaded by the person who built it (unless the templates XML was changed manually after the original download).
  3. Change any configurations you need to change, and connect to the rest of the flow you want to build.

5. How does this work?

You can reuse (instantiate) a single component as many times as you want to in a single flow and in as many hierarchical levels of process groups as you want.  How does this work? How does NiFi instantiate each separately? When you drag a processor, connection, or process group onto the canvas each is given a UUID like 9fc758e3-0157-1000-e89d-a6033019f0cf. The first part of this is a global id and the second part is a instance id. When you download it as a template, the global id is retained but the instance id is set to 0s, e.g. 9fc758e3-0157-1000-0000-000000000000. When you upload a template and drag it to the canvas, the instance id is converted from 0s to a new unique instance id e.g 9fc758e3-0157-1000-e17b-1bc0cb0c1921. Simple but powerful.

6. Software Development Lifecycle (SDLC)

Templates and custom configuration files can be considered as code and thus easily integrated into a typical SDLC.


Summary:

  1. Reusable components are added as templates to a central repository. This should be governed. Reusable components are probably best represented as process groups. This makes building new flows simpler by separating the reuse components (and encapsulation of details) from the new flow components.
  2. Development groups pull reusable components and upload to their NiFi environment to build new flows.  In flow configurations, sensitive values should be configured as Expression Language references to OS environment variables that you set in each environment, e.g. ${MYSQL_PSSWRD}.  Other environment-specific config values should similarly use Expression Language references. If these are not sensitive, should be in custom properties file.
  3. Developers finalize their flows using other components and configurations, and submit the template of the final flow (and required custom property files) to version control, eg Git.
  4. Template and custom property files are promoted to each environment just as source code typically is.
  5. Automation: deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools.
  6. Governance bodies decide which configurations can be changed in real-time (e.g. ControlRate properties). These changes do not need to go through verision control and can be made by authorized admins on the fly. For authorization policies, see: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html

Shoutouts and links

Many thanks to NiFi SMEs for validation of technical aspects of reuse ideas, particularly Andy LoPresto, Andrew Grande, Andrew Psaltis, Koji Kamimura and Matt Burgess.

Useful links:

https://nifi.apache.org/docs/nifi-docs/
https://nifi.apache.org/docs/nifi-docs/html/getting-started.html
https://community.hortonworks.com/content/kbentry/16461/nifi-understanding-how-to-use-process-groups-and-r.html
https://community.hortonworks.com/articles/57304/supporting-custom-properties-for-expression-langua.html
https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.0/bk_user-guide/content/Using_Custom_Properties.html

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>