Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button

Maximize the Value of Data-in-Motion with Big Data from the Internet of Things

play video button video

Get Started with Hortonworks DataFlow

Download Now
Hortonworks DataFlow (HDF)

Hortonworks DataFlow (HDF)

Hortonworks DataFlow (HDF) provides the only end-to-end platform that collects, curates, analyzes and acts on data in real-time, on-premises or in the cloud, with a drag-and-drop visual interface. HDF is an integrated solution with Apache Nifi/MiNifi, Apache Kafka, Apache Storm and Druid.

The HDF streaming real-time data analytics platform includes data flow management systems, stream processing, and enterprise services.

Powering the Future of Data

HDF Data-in-Motion Platform
Kafka Storm NiFi minifi Ranger Ambari Knox

Three Major Components of Hortonworks DataFlow


Easy, Secure, and Reliable Way to Manage Data Flow 

Collect and manipulate internet of things big data flows securely and efficiently while giving real-time operational visibility, control, and management.

Immediate and Continuous Insights  

Build streaming analytics applications in minutes to capture perishable insights in real-time without writing a single line of code.

Learn More

Corporate Governance, Security and Operations 

Manage the HDF and HDP ecosystem with comprehensive management panel for provisioning, monitoring, and governance.

Learn More

Integrated data-source agnostic collection platform

HDF has full featured data collection capabilities that are streaming data agnostic and integrated with over 220 processors. Big Data from the internet of things can be collected from dynamic and distributed sources of differing formats, schemas, protocols, speeds and sizes and from types such as machines, geo location devices, click streams, files, social feeds, log files and videos.

More Info:

  • How real-time data-source agnostic dataflow management makes data movement easy
    Watch Video
    Learn More
    Learn what HDF can do to optimize log analytics from the Edge.Read More
Powerful Data Collection


With HDF, data collection is no longer a tedious process. You can manage data in full flight with a visual control panel to adjust sources, join and split streams, and prioritize data flow. HDF also can add contextual data to your streams for more complete analysis and insight. The always-on data provenance and audit trails provide security and governance compliance and troubleshooting as necessary in real-time. Integrated with Apache NiFi, MiNiFi, Kafka and Storm, HDF is ready for high volume event processing for immediate analysis and action. Kafka allows differing rates of data creation and delivery while Storm provides streaming real-time data analytics and immediate insights at a massive scale.

More Info:

  • How streaming data managed through a real-time visual user interface of Apache NiFi increases operational effectiveness.
    Watch Video
Real-Time Data Flow Management


HDF secures end-to-end data flow and routing from source to destination with discrete user authorization and detailed, real-time visual chain of custody. Use the visual user interface of HDF to encrypt streaming data, route it to Kafka, configure buffers and manage congestion so that data can be dynamically prioritized and securely sent. HDF enables role-based data access that allows enterprises to dynamically and securely share select pieces of pertinent data. HDF can easily deploy flow management and streaming applications in a Kerberized environment without much operational overhead.

More Info:

  • See how granular access of data is better than role based access
    Watch Video
Enterprise-Grade Security


HDF includes a complete streaming analytics module, Streaming Analytics Manager (SAM), to build streaming analytics applications that do event correlation, context enrichment, complex pattern matching, analytical aggregations and create alerts/notifications when insights are discovered. SAM makes building streaming analytics easy for application developers, DevOps and business analysts to build, develop, collaborate, analyze, deploy, and manage applications in minutes without writing a single line of code. Analysts use pre-built charts to quickly build analysis and create dashboards, while DevOps can manage and monitor the applications performance right out of the box.

Developers can experiment in the creation of SAM apps using mock data and create unit tests for SAM Apps using the new SAM “Test Mode”. The new SAM Operations Module gives users the ability to easily test, debug, troubleshoot, and monitor the deployed applications, making the operations of a running application as easy as building the application.

More Info:


HDF includes, Schema Registry, a central schema repository that allows analytics applications to flexibly interact with each other. This enables users to save, edit, or retrieve schemas for the data they need. This also allows easy attachment of schemas to each data without incurring additional overhead for greater operational efficiency. With schema version management, data consumers and data producers can evolve at different rates. And, through schema validation, data quality is greatly improved. A central schema registry also provides for greater governance of how data is used. Schema Registry is integrated with Apache Nifi and HDF Streaming Analytics Manager.

More Info:


Apache NiFi Registry, a new Apache sub-project now included within HDF Enterprise Management Services, facilitates the development, management and portability of data flows. Core to its functionality is the ability to abstract data flow schemas and programs to enable users to track and monitor data flow changes at a more granular level. Data flow schemas are stored in a shared repository that allows for easy sharing on a global basis as well as versioning of schemas.

Through this, the export and import of data flows allow easy porting and enables smooth migration of data flows from one environment to another. The functionality significantly improves the storage, control, and management of versioned flows, further shortening the software development life cycle and accelerating application deployment to achieve faster time to value.

What's New in HDF

Speed up data flow management, development and operations


Develop, manage and port data flows easily using Apache NiFi Registry


Track and monitor data flow changes at a more granular level


Share and version schemas using the shared repository for data flow schemas


Export and import data flows easily from one environment to another

Increase developer productivity


Improve streaming data operations in Hortonworks Streaming Analytics Manager (SAM)


Create SAM apps or unit tests for SAM apps using the new SAM “Test Mode”


Test, debug, troubleshoot, and monitor deployed applications with the new SAM Operations module

Experience tight integration with Kafka 1.0


Install, configure, manage, upgrade, monitor, and secure Kafka 1.0 clusters with Ambari


Enhance data governance and lineage by managing access control policies with Apache Ranger


Leverage new processors in NiFi and SAM to use Kafka 1.0 features including message headers and transactions

HDF User Guides

Get HDF release notes; guides for users, developers and getting started.


The industry’s best support for Apache NiFi, Kafka and Storm in the enterprise. Connect to our team experts to help guide your journey.


Real-world training from the Big Data experts. Available in person or on-demand whenever you need us.