Last week, as part of the HDF 3.1 Blog Series, we talked about support for Apache Kafka 1.0 and the powerful HDF integrations including Apache NiFi’s Kafka processors, Apache Ambari for provisioning/management/monitoring and Ranger for access control policies and audit for Apache Kafka.
Today, in this fourth part of the series, we discuss the innovations added to Hortonworks Streaming Analytics Manager, aka SAM, specifically around tooling for developers to test streaming analytics apps.
Last summer when SAM was unveiled as part of HDF 3.0, the fundamental problem we were trying to solve for our customers was to help them build streaming analytics app faster. It was to address the following sentiment expressed by so many of our customers:
“Using NiFi with its rich UI has been a refreshingly delightful experience for us as we build flow management applications. However, we desperately need the same type of experience when building streaming analytics apps. Flow management only gets us halfway there. We need a rich UI to build analytical apps that operate on the stream.”
As our customers have started to use SAM to build streaming analytics apps in different verticals ranging from transportation, healthcare to insurance, we are seeing app dev teams and business analysts being able to deliver value to the business faster.
To demonstrate this, lets build off the trucking company’s use case that we presented in the last blog. This trucking company wants to build real-time data flow apps to ingest the streams, perform routing, transformations, enrichment and deliver them to downstream consumers for streaming analytics. In the previous blog, we discussed how Apache MiNiFi, NiFi and Kafka combined can implement the flow requirements of edge data collection, routing, transformation, enrichment and delivery of the streams to downstream consumers for streaming analytics. SAM can then be used to implement the streaming analytics requirements like the following:
The below showcases how SAM implements each of these requirements.
As the above SAM app showcases, building complex streaming analytics apps using constructs like joins across streams, aggregations over time windows, enrichment, normalization and executing machine learning models becomes easier.
A common challenge that we often hear from app dev teams who specialize in implementing streaming applications is the following:
“It’s difficult to test my streaming analytics apps locally before deploying to a cluster. There needs to be better tooling to help developers with unit and integration testing of streaming apps.”
SAM’s new Test Mode solves this problem by enabling developers to test SAM apps by mocking out sources using test data and stubbing out the destination sinks.
To showcase Sam’s Test Mode, assume for the above truck-streaming-analytics-app, we have the following assertions we need to test.
The following demonstrates how to create the test case in SAM to validate the assertions.
When the test case is executed, SAM displays the output at each component/processor in the app as it flows across your application. This enables the developer to validate the outputs visually for different test cases. The following is the result of SAM test case execution.
As the above diagram illustrates, SAM’s Test Mode allows the developer to validate/test visually before deploying to a streaming cluster. The feedback from customers has been that using SAM Test Mode is helpful for testing but what they really want are following:
SAM addresses each of these needs since all the capabilities exposed in SAM are powered and exposed via SAM REST services. This includes SAM Test Mode. Hence, the seven assertions above can be written as a JUnit test using SAM Test Mode’s RESTful services as shown below.
To see the full JUnit Test class, see here.
Most enterprise organizations have standards on continuous integration and delivery pipelines for custom applications to increase software quality and decrease the time to market. One of the fundamental design principles of SAM is to expose all capabilities via REST. This allows customers to easily build CI and CD pipelines for SAM applications.
The CI/CD pipeline can be implemented with SAM REST using Jenkins. The following demonstrates this.
For more details on each of the CI & CD steps outlined above, see the following artifacts:
The following is the result of a CI/CD Jenkins pipeline execution for the trucking streaming analytics app.
With SAM, it becomes incredibly easier to build SAM apps. With SAM Test Mode, the developer can test/validate the app visually before deploying to the cluster. With SAM REST, teams can build automated unit tests, continuous integration and delivery pipelines to meet the needs of the enterprise. Next week, we will talk about the new NiFi and Atlas integrations that was added in HDF 3.1. Stay Tuned!
Join us for our HDF 3.1 webinar series where we dig deeper into the features with the Hortonworks product team. Redefining Data-in-Motion with Modern Data Architectures.