Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
May 09, 2017
prev slideNext slide

HDF Series Part 1: Hortonworks’ Thoughts on Building a Successful Streaming Analytics Platform

As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles implementing their big data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study.

In my 4+ years at Hortonworks, I have had many opportunities for face time with our more than 1000 customers. These conversations have strongly influenced how we build enterprise software products that are easier to use.

There have been a handful of moments with customers that leave an indelible mark, reshaping how one thinks about a problem set. One of those moments occurred a few months ago with a customer who was using Apache NiFi as part of the Hortonworks DataFlow (HDF) platform to ingest, route/move, enrich, and transform data from edge devices like cable modems, voice over IP phones, and home security systems. HDF was transformative for this customer and they especially appreciated NiFi’s compelling user experience to greatly reduce operational effort for data ingestion and flow management.

I posed the following question to the customer:

“Where did you experience pain when implementing this use case? Where can we continue to innovate in HDF to ease those pains?”

The response went something like this:

“Using NiFi with its rich UI has been a refreshingly delightful experience for us as we build flow management applications. However, we desperately need the same type of experience when building streaming analytics apps. Flow management only gets us halfway there. We need a rich UI to build analytical apps that operate on the stream.”

The above response has been echoed by almost every one of our customers, and it has strongly influenced the strategic direction, efforts, and investments in the Hortonworks data-in-motion platform: Hortonworks DataFlow (HDF). We have gleaned two insights from the customer’s response:

  • Building end to end data-in-motion use cases require both flow management and streaming analytics capabilities.
  • Building streaming analytics must get easier.

Data-In-Motion Solutions Requires both Flow Management and Stream Analytics

What is the difference between flow management and streaming analytics?

  • Flow management provides an easy, secure, and reliable way to get the data you need from anywhere (edge, cloud, data center) to any downstream system with intelligence (routing, transformation, filtering, bi-directional communication).
  • Streaming analytics provides immediate and continuous insights using aggregations over windows, pattern matching, predictive and prescriptive analytics, and so on. Streaming analytics is part of a superset of capabilities provided by stream processing.

As the customer above noted, one needs both capabilities to be successful. This is the reason that HDF was expanded in the middle of 2016 to offer stream processing in the HDF 2.0 release with Apache Storm and Kafka. The below diagram summarizes this expansion. 

Building Stream Analytics Apps Must Get Easier

Simply adding Apache Storm and Kafka to HDF does not address the second key point: building stream analytics quickly and easily. Customers often cite the following key challenges:

  1. Building stream analytics apps require specialized skill sets that most enterprise organizations do not have today.
  2. Stream analytics apps require a considerable amount of low level programming, testing, and tuning to bring to production.
  3. It takes a lot of time to design, develop, test, and deploy into production.
  4. Key streaming basics such as joining/splitting streams, aggregations over windows of time, and pattern matching are difficult to implement.
  5. Customers do not want to code complex stream analytics apps.
  6. While traditional mature streaming vendors (IBM Streams, Tibco, SAS, SAP) solve challenges 1-5, they are cost prohibitive, proprietary, and do not provide scale-out architectures.
  7. No truly open source tool solves challenges 1-5 today.

How do we address these challenges? Over the last 6 months, the Hortonworks Stream Processing engineering and product management teams have been working on a brand new set of powerful components that address each of these challenges. The below sections outline some of the fundamental principles driving this initiative.


Next Generation Streaming Analytics Solution Needs to Cater to 3 Different User Personas

There were two driving design principles that drove this effort. First, these new set of components should allow the user  to design, develop, deploy and manage complex streaming analytics apps without them knowing the complexities of the the underlying streaming engine. The developer should be able to build complex streaming analytics apps writing as as little code as possible. Second, the toolsets need to cater to three important personas within the organization:

  • App Developers — Design, develop, and deploy streaming apps using a drag and drop visual paradigm.
  • Operations Team — Create abstractions to big data services for App Developers, and supply tooling to help operational users deploy, monitor, and manage streaming apps.
  • Business Analysts — Allows business analysts to immediately access the streaming data and perform descriptive analytics on the streams using a powerful exploration platform.


Whats Next?

Over the next few weeks, the Hortonworks engineering and product management team will be publishing a series of blogs providing more details on this new tool and other new enterprise management services required for this new and exciting technology. Stay Tuned!!

Read the next blog post in this series: HDF Series Part 2: A Shared Schema Registry – What is it and Why is it important?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums