DataTorrent is a Hortonworks Certified Technology Partner and YARN Ready, offering an enterprise class real-time streaming platform on Hadoop and Hortonworks Data Platform. Thomas Weise, principal architect at DataTorrent, is our guest blogger today.
Kafka is becoming increasingly popular as the data bus to move data in and out of Hadoop clusters. Kafka’s architecture with scalability and good performance make it a natural fit.
Kafka runs as cluster of broker servers. There is currently very limited support for the management of these Kafka servers. Hadoop 2.x brought YARN, which is now widely supported as part of Hadoop distros and emerging as distributed operating system.
It makes sense to integrate Kafka with YARN. Existing investments and skills can be leveraged. Kafka running under the YARN umbrella can utilize the centrally managed pool of resources. The process monitoring and recovery features of YARN can be extended to provide complete HA for Kafka servers (Kafka provides replicated partitions, but it does not offer automation for dealing with failed brokers).
Generally building a native YARN application is the better approach as it exposes the full flexibility of YARN to the developer. The flipside is that building YARN applications isn’t easy. It requires deep expertise and time to mature. We built DataTorrent RTS as native YARN application, deeply integrated through a full-fledged application master, designed to optimally support the unique characteristics of the product.
Given the background, why not set out and write a completely new application master for KOYA? Considering our goals with KOYA and that Kafka was built with fault tolerance in mind and already provides most of the HA features, we evaluated Apache Slider. Slider was built to enable long running services on YARN without making changes to the services themselves. We found it sufficient to bring Kafka to YARN using Slider as it provides much of the infrastructure required for KOYA:
KOYA requires Hadoop 2.6 and is designed to work on all Hadoop distros with that version. It supports installation with embedded Slider or as Slider application package that can be added to an existing Slider install. KOYA consists of Python scripts for the agent and configuration files. Creating a Slider application package is straightforward, especially once the agent API is better documented and more flexibility added for component instance configuration.
KOYA is under development as open source and we are looking to take it forward in collaboration with Kafka and YARN communities. We are targeting Q2 for the first release. One of our objectives is to provide a dedicated admin web service for the Kafka cluster. We see this as future part of Kafka that should be integrated as a Slider component and plan to work with the Kafka community on it. We also identified a number of enhancements to Slider that we are looking forward to incorporate with future releases.