Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 17, 2016
prev slideNext slide

HDF 2.0: Apache NiFi integration with Apache Ambari/Ranger

We recently hosted a webinar on the topic of  HDF 2.0 and the integration between Apache NiFi, Apache Ambari and Apache Ranger.  We thought we would share the questions & answers from the webinar, and also compile relevant data into a single place to make it easy to find and reference.

Should you have any more questions, anytime, we encourage you to check out the Data Ingestion & Streaming track of Hortonworks Community Connection where an entire community of folks are monitoring and responding to questions. You may also want to check out our HDF home pageHDF support and HDF operations training pages.

Highlights of integrating Apache NiFi with Apache Ambari/Ranger

The mechanics of setting up HDF 2.0 using Apache Ambari’s Install Wizard is outlined in the official documentation here and sample steps to automate the setup via Ambari blueprints are provided here. Some features NiFi administrators can leverage when using Ambari managed HDF 2.0 clusters vs using NiFi standalone are as follows:

  1. Ease of Deployment – Users have the choice of deploying NiFi through Ambari install wizard or operationalize via blueprints automation
  2. Ease of Configuration – Ambari allows configurations to be done once across the cluster. This is time saving because when setting up NiFi standalone, users need to manage configuration files on each node NiFi is running on.
    1. Update configs via Ambari REST API
    2. Configuration history is available meaning that users can diff versions and revert to older version etc
    3. Host-specific configurations can be managed using ‘Config groups’ feature
    4. Common’ configs are grouped together and exposed in the first config section (‘Advanced NiFi-ambari-config’) to allow configuration of commonly used properties:
    5. Contents of NiFi.properties are exposed under ‘Advanced NiFi-properties’ as key/value pairs with helptext
    6. Other property-based configuration files exposed as jinja templates
    7. Other xml based config files also exposed as jinja templates
    8. Most important NiFi config files are exposed via Ambari and are managed there (e.g. NiFi.properties, bootstrap.conf etc)
  3. Ease of Monitoring
    1. Logsearch integration is included for ease of visualizing/debugging NiFi logs w/o connecting to system
  4. Ease of Security
    1. NiFi Identity Mappings
    2. Active Directory/LDAP Integration
    3. SSL for NiFi
    4. Ranger Integration with NiFi
    5. Kerboros for NiFi

For more info, see:

Other related content:

11 Questions and Answers from HDF 2.0 and Apache Ambari Webinar

  1. What’s the difference between HDP and HDF?
    • Hortonworks Data Platform is a grouping of Apache projects to support Data at Rest. Hortonworks DataFlow is a grouping of Apache projects to support Data in Motion. THe two are complementary to each other, and share some similar projects but are used for different purposes.
  2. Shouldn’t the NiFi be installed only on the edge node and not on all the nodes ?
    • NiFi is usually installed on edge nodes (traditionally clients to a hadoop cluster or gateway to some other set of internal servers).
  3. Does NiFi have a SFDC processor?
    • No, but you are able to integrate with SFDC via the REST API.
    • https://community.hortonworks.com/questions/12892/salesforce-integration-with-hortonworks-data-flow.html
  4. Is there a pre-defined set of rules to be applied on the incoming data in NiFi ? Also, can there be custom rules that can be stored by multiple users?
    • NiFi does not have a pre-defined set of rules for data; out of the box it allows users to create and execute a data flow and which will help retrieve, cleanse/transform or route data as needed.  The flow would need to be configured to enforce any particular business or application specific rules required for incoming data.  Using templates that same logic can be reused for other data flows or imported to other NiFi clusters.
  5. When transporting incoming data via NiFi, can we expose the incoming data to downstream systems via Rest API ? and
    • Yes NiFi can be used to create a flow which includes processors to send data to downstream systems.  In this case for Rest APIs a PostHTTP processor can included in a flow to send data to another system. InvokeHTTP can be used as well to support additional HTTP methods (such as GET, DELETE,PUT,etc).
  6. Are there some pre-defined workflow templates in NiFi that can be reused / enhanced by the team working on it ?
  7. What is the recommended approach of NiFi installation for a multi-tenant single hadoop cluster ?
  8. Is Ambari only for initial installation of HDF or also for upgrades?
    • Ambari is currently used for only initial installation of HDF because this is the first release. In the future, it will support upgrades.
  9. Can you configure HDP and HDF with a single instance of Ambari when starting from scratch?
    • Not at this time. This is a upcoming feature.
  10. I believe only Ambari server dedicated setup required for HDP and HDF. All other nodes can be shared between HDP and HDF components ?
      1. Currently, nodes can not be shared between HDP and HDF. Completely separate clusters (each with its own Ambari and Ranger) are required at this point.
  11. Are Ambari metrics schedulable/downloadable in pdf or other formats?

hdf-ambari-memory-cpu

Leave a Reply

Your email address will not be published. Required fields are marked *