Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 01, 2017
prev slideNext slide

Streaming Analytics Manager Extensibility

Streaming application development can be a complex process.  SAM (Streaming Analytics Manager) has been developed to help accelerate and simplify designing, building and deploying streaming apps without a single line of code.  SAM enables this by:

  1. Abstracting, designing, and developing with a UI interface for  building streaming applications so you don’t have to write code
  2. Providing a series of pre-built processors (220+) to quickly assemble an app.

However, this might suffice for many applications but it might get you only 80% there.  So how do you complete the last 20% of your application?  SAM addresses your application gap by providing three extension points.  These extension points allow you to plug custom code to address your particular application requirements that are not addressed by pre-built processors.

What are the four extensions points and how do you integrate your code?  These extensions points integrate your custom code.

1. UDFs – Projection Processors

A projection processor is a component that lets you apply user-defined functions (UDFs) that are either built-in or custom on the fields of events flowing through it.  In addition, it projects fields making up output schema from that Projection processor component. User Defined Functions (UDFs) allow you to do simple transformations on event streams. This is used in the Projection processor.

2. UDAFs – Aggregate Processors

An aggregate processor is a component that lets you apply user-defined aggregate functions (UDAFs) that are either built-in or custom on the fields of events over a window of time (event or processing) and/or count.  User Defined Aggregate Functions (UDAF) allow you to add custom aggregate functions to SAM. Once you create and register UADFs, they are available for use in the Aggregate processor.

3. Custom Processors

A custom processor is a component in SAM that provides extensibility by allowing you to plug into the application processing logic that is custom and not handled by a built-in component. You can register a custom processor once and reuse it in different SAM applications that need the same custom processing with possibly different configuration values. The custom processor implementation will receive an event and is expected to return one or more events with possibly different fields after processing the input event.  You can create a custom processor using the SDK and package it into a jar file with all of its dependencies.

4. Custom Sources/Sinks

If you want to leverage existing code and re-use it in any SAM applications, you can now do so with custom sources/sinks. A good example of this is existing Apache Spout code that is interacting with a proprietary system. Through a simple process to build and register a Spout as a custom SAM resource, as demonstrated in this article, the component would then appear in the palette on the left hand side of SAM application builder canvas. With a simple drag and drop of the custom source on to the building of a new SAM app, this capability helps avoid code rewrites, increases user productivity, and accelerates application development for faster time to value.

What’s the difference at a high level between the four extensions and when do you use which?

UDFs and UDAFs are suitable for cases where you want to apply a function to one or more fields of the event and return the result.  Some examples include trim, concat, etc. that are available as built-in UDFs.  Custom processors are more suitable for cases where you want to take the entire event, process it and return zero or more events corresponding to the input event.  The processing could be anything.  Examples of this type of processing include looking up an external database and enriching the event for more information.

The end results: These four extensions allow you to complete your app and address the gaps with the processors that you don’t have.  And it’s so easy to do!

Streaming Analytics Manager

 To get more information, please check out the following HCC articles: 


Helpful links:

Leave a Reply

Your email address will not be published. Required fields are marked *