Get Started with Hortonworks DataFlowDownload Now
Collect and manipulate internet of things big data flows securely and efficiently while giving real-time operational visibility, control, and management.
Build streaming analytics applications in minutes to capture perishable insights in real-time without writing a single line of code.Learn More
Manage the HDF and HDP ecosystem with comprehensive management panel for provisioning, monitoring, and governance.Learn More
HDF has full featured data collection capabilities that are streaming data agnostic and integrated with over 220 processors. Big Data from the internet of things can be collected from dynamic and distributed sources of differing formats, schemas, protocols, speeds and sizes and from types such as machines, geo location devices, click streams, files, social feeds, log files and videos.
With HDF, data collection is no longer a tedious process. You can manage data in full flight with a visual control panel to adjust sources, join and split streams, and prioritize data flow. HDF also can add contextual data to your streams for more complete analysis and insight. The always-on data provenance and audit trails provide security and governance compliance and troubleshooting as necessary in real-time. Integrated with Apache NiFi, MiNiFi, Kafka and Storm, HDF is ready for high volume event processing for immediate analysis and action. Kafka allows differing rates of data creation and delivery while Storm provides streaming real-time data analytics and immediate insights at a massive scale.
HDF secures end-to-end data flow and routing from source to destination with discrete user authorization and detailed, real-time visual chain of custody. Use the visual user interface of HDF to encrypt streaming data, route it to Kafka, configure buffers and manage congestion so that data can be dynamically prioritized and securely sent. HDF enables role-based data access that allows enterprises to dynamically and securely share select pieces of pertinent data. HDF can easily deploy flow management and streaming applications in a Kerberized environment without much operational overhead.
HDF includes a complete streaming analytics module, Streaming Analytics Manager (SAM), to build streaming analytics applications that do event correlation, context enrichment, complex pattern matching, analytical aggregations and create alerts/notifications when insights are discovered. SAM makes building streaming analytics easy for application developers, DevOps and business analysts to build, develop, collaborate, analyze, deploy, and manage applications in minutes without writing a single line of code. Analysts use pre-built charts to quickly build analysis and create dashboards, while DevOps can manage and monitor the applications performance right out of the box.
Developers can experiment in the creation of SAM apps using mock data and create unit tests for SAM Apps using the new SAM “Test Mode”. The new SAM Operations Module gives users the ability to easily test, debug, troubleshoot, and monitor the deployed applications, making the operations of a running application as easy as building the application.
HDF includes, Schema Registry, a central schema repository that allows analytics applications to flexibly interact with each other. This enables users to save, edit, or retrieve schemas for the data they need. This also allows easy attachment of schemas to each data without incurring additional overhead for greater operational efficiency. With schema version management, data consumers and data producers can evolve at different rates. And, through schema validation, data quality is greatly improved. A central schema registry also provides for greater governance of how data is used. Schema Registry is integrated with Apache Nifi and HDF Streaming Analytics Manager.
Apache NiFi Registry, a new Apache sub-project now included within HDF Enterprise Management Services, facilitates the development, management and portability of data flows. Core to its functionality is the ability to abstract data flow schemas and programs to enable users to track and monitor data flow changes at a more granular level. Data flow schemas are stored in a shared repository that allows for easy sharing on a global basis as well as versioning of schemas.
Through this, the export and import of data flows allow easy porting and enables smooth migration of data flows from one environment to another. The functionality significantly improves the storage, control, and management of versioned flows, further shortening the software development life cycle and accelerating application deployment to achieve faster time to value.
Get HDF release notes; guides for users, developers and getting started.
The industry’s best support for Apache NiFi, Kafka and Storm in the enterprise. Connect to our team experts to help guide your journey.
Real-world training from the Big Data experts. Available in person or on-demand whenever you need us.