Last week, in Part 3 of this blog series, we announced the GA of HDF 3.0 and let the cat out of the bag by introducing the new open source component called Streaming Analytics Manager (SAM), an exciting new technology that helps developers, operators, and business analysts to build, deploy, manage, monitor streaming analytics apps. SAM consists of 3 modules that caters to three different personas within an organization as the below diagram illustrates.
In this blog, we will focus on the Stream Builder module that focuses on the app developers within the enterprise.
Working with users over the last few years who built streaming apps with different engines like Storm, Spark Streaming and others, the challenges faced by most large enterprise customers include the following:
SAM’s Stream Builder aims to solve each of these challenges. Read on!
A typical stream app will work with a number of different big data services to create streams, store events and do analytics. Common big data services used in streaming apps include Kafka, HDFS, HBase, Spark, Cassandra, SOLR, Elastic, etc.. The app developer building the stream app has to know the internals of how to work with each of these services. For example, to connect to a Kafka cluster, the app developer has to find out the hosts of each kafka broker and their port. In other words, service discovery and configuration has to be easier.
SAM solves these problems with the powerful Service Pool feature. The following are the fundamental constructs of Service Pools.
With SAM, adding Service Pools is as easy as entering the Ambari Rest endpoint and clicking ‘Auto Add’.
When a service pool is created, all of the configuration to manage and connect to the big data services in the pool are imported from Ambari into SAM. If a configuration associated with a service is changed in Ambari, the service pool can adopt the new configuration by refreshing the pool as indicated in the following diagram.
An environment is a named entity that represents a set of services chosen from different service pools. When a stream app is assigned to an environment, the app can use the services associated with that environment. To create an environment, give an environment a name and select services for that environment across different service pools.
You can create different environments based on your needs.
When creating a new stream app, give it a name and select the Environment. The app can work each of the big data services within that Environment.
The app developer uses the Stream Builder canvas to build streaming apps by dragging components from the palette, configuring them, connecting component together and then deploying the app.
There are 3 types of components on the canvas palette: Sources, Processors and Sinks. SAM provides a number of out of the box components as well as a simple SDK where you can register your own component and get them added the palette. The list of processors that are supported OOO include:
In Part 2 of this blog series, we introduced the need for a Schema Registry, a central schema repository that allows applications and HDF components (NiFi, Storm, Kafka, and others) to flexibly interact with each other. Streaming apps require a schema and hence SAM has first class integration with the Hortonworks Schema Registry.
The first step in creating an app is to start with a source component. A common source component used by many customers is a Kafka topic. In the below diagram, we drag the Kafka component onto the canvas.
To configure the Kafka component, you double click it. Below is the configuration for the Kafka config for the Kafka component.
The above config for Kafka Topic shows two powerful integrations with SAM:
The below diagrams shows a full end to end stream app built using the Stream Builder without writing any code.
The above app showcases the following:
To see a step by step short video of how the above app was created using SAM’s Stream Builder, watch the following video: Create a streaming analytics app in 10min. Also check out the detailed steps to recreate the app in the following doc: Getting Started with Streaming Analytics
The next blog in this series will talk through the Stream Operations module of SAM. Stay Tuned!