Last week, we announced the HDF 3.1 Release and commenced the HDF 3.1 Blog series. Earlier this week, in part 2 of the blog series, we introduced the new HDF component called Apache Nifi Registry which allows developers to version control flow artifacts to meet their SDLC enterprise requirements.
Today, in this third part of the series, we discuss Apache Kafka 1.0, that was introduced in the 3.1 release, and the powerful HDF integrations that customers have asked for.
Apache Kafka .11/1.0 has some powerful new features and improvements including support for key/value message headers, improved messaging semantics (e.g: transactional support) and major performance enhancements. For details on these new features, see the following:
In this blog, we will highlight the HDF integrations that our customers have asked for with Kafka 1.0 to meet their enterprise requirements. We will showcase these integrations in the context of a streaming analytics use case.
The fictitious use case centers around a trucking company that has a large fleet of trucks. Each truck is outfitted with two sensors that emit event data such as timestamp, driver ID, truck ID, route, geographic location, and event type. The geo event sensor emits geographic information (latitude and longitude coordinates) and events such as excessive breaking or speeding. The speed sensor emits the speed of the vehicle. The trucking company wants to build real-time data flow apps to ingest the streams, perform routing, transformations, enrichment and deliver them to downstream consumers for streaming analytics.
Working with customers implementing streaming analytics use cases similar to this one, we are seeing Kafka play a crucial role in their architectures. In fact, the use of Kafka has been so prevalent that there are a common set of features that our customers are asking for that typically come from three different teams in the enterprise.
In the subsequent sections, we will talk through these requirements and how the new HDF 3.1 features address them.
The below diagram illustrates how requirements 1 through 5 are implemented with HDF 3.1 using Apache Kafka 1.0, Apache Nifi, Apache MiNifi and Schema Registry.
The following summarizes the above diagram describing how requirements 1 through 5 are met:
Requirement 6 is about agility and self service. Working with our large customer base, the single most asked feature centers around the following question:
“How can we make our app-dev and dev-ops teams more agile? We need them to deliver features faster to the Business.”
As the above diagram illustrates, NiFi’s key value proposition is enabling developers to build complex flow apps faster. With NiFi’s Kafka 1.0 support, Kafka centric requirements like 1 through 5 can be implemented very quickly without having to write a lot of code.
The following show the configuration of the ConsumeKafkaRecord and PublishKafkaRecord processor that make this possible.
A majority of our customers’ Kafka production deployments are Kerberized with AD integration to support their enterprise security requirements. Apache Ambari makes this painful kerberization process considerably easier. Ambari provides a kerberization wizard that automates all of the details required to kerberize a multi-node Kafka cluster. See the following diagram for more details about how Req 1 is implemented via Ambari.
Apache Ranger is another key component of the enterprise services that power HDF. Ranger provides a centralized platform to define, administer and manage security policies for Kafka. A majority of our customers use Ranger to meet requirement 2 as well as the access control rules described in requirements 3 through 5. The below diagram illustrates these ACL requirements.
The following illustrates how to implement ACL policies for requirements 3, 4 and 5.
Ranger, in addition to providing a centralized policy store, also provides comprehensive audit services that captures and indexes all authorized/unauthorized access to Kafka resources. With Ranger, the audit requirement 6 can be met as the following shows.
Apache Ambari, which is part of HDF enterprise management services, provides all the capabilities to fulfill requirements 1 through 8 above. The following shows Ambari meeting Req 1 and 2.
The following shows Ambari meeting Req 3 and 4.
The diagram below shows using Ambari Log Search to meet the search requirement 5.
Apache Ambari and its integration with Grafana and the Metrics Server(AMS) provides critical monitoring and alerting for Kafka clusters. Ambari provides three custom Grafana dashboards that provides broker and topic level KPMs for Kafka 1.0 clusters as the below diagram illustrates.
Ambari also supports creating custom visualization widgets based on Kafka JMX metrics.
From these KPM metrics, the user is able to create custom alerts in Ambari. The below shows the alert definition to determine if Kafka process is up/down.
Ambari supports rolling upgrades of Kafka when upgrading from HDF 3.0 to HDF 3.1. This means that the user can upgrade Kafka 0.10.X to Kafka 1.0 without incurring any downtime using Ambari’s rolling upgrade capability.
As we have shown in detail, HDF 3.1 with Kafka 1.0 support and the powerful HDF integrations helps customers meet the enterprise requirements from the app-dev, operations/platform and security teams.
Stay tuned next week as we continue the HDF 3.1 Blog series and discuss topics around testing, continuous integration/ delivery and stream operations.
Join us for our HDF 3.1 webinar series where we dig deeper into the features with the Hortonworks product team. Redefining Data-in-Motion with Modern Data Architectures.