Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 05, 2015
prev slideNext slide

KOYA on Apache Slider

DataTorrent is a Hortonworks Certified Technology Partner and YARN Ready, offering an enterprise class real-time streaming platform on Hadoop and Hortonworks Data Platform. Thomas Weise, principal architect at DataTorrent, is our guest blogger today.

A while ago, DataTorrent announced a new initiative to integrate Kafka and YARN under the KOYA project. KOYA was proposed as KAFKA-1754 and well received by the community.

Why KOYA?

Kafka is becoming increasingly popular as the data bus to move data in and out of Hadoop clusters. Kafka’s architecture with scalability and good performance make it a natural fit.

Kafka runs as cluster of broker servers. There is currently very limited support for the management of these Kafka servers. Hadoop 2.x brought YARN, which is now widely supported as part of Hadoop distros and emerging as distributed operating system.

It makes sense to integrate Kafka with YARN. Existing investments and skills can be leveraged. Kafka running under the YARN umbrella can utilize the centrally managed pool of resources. The process monitoring and recovery features of YARN can be extended to provide complete HA for Kafka servers (Kafka provides replicated partitions, but it does not offer automation for dealing with failed brokers).

Why Slider?

Generally building a native YARN application is the better approach as it exposes the full flexibility of YARN to the developer. The flipside is that building YARN applications isn’t easy. It requires deep expertise and time to mature. We built DataTorrent RTS as native YARN application, deeply integrated through a full-fledged application master, designed to optimally support the unique characteristics of the product.

Given the background, why not set out and write a completely new application master for KOYA? Considering our goals with KOYA and that Kafka was built with fault tolerance in mind and already provides most of the HA features, we evaluated Apache Slider. Slider was built to enable long running services on YARN without making changes to the services themselves. We found it sufficient to bring Kafka to YARN using Slider as it provides much of the infrastructure required for KOYA:

  • YARN application master with management of container failures: Slider will handle the process failure and attempt to restart the process on the same node, if possible.
  • Sticky allocation of components to hosts across AM restart: Kafka stores the data in the local file system and the server process needs to be launched on the same machine. Slider remembers the host on which a component was running previously and can be instructed to pin the component to it.
  • Configuration and deployment of components: The Kafka package can be specified in the Slider application configuration and will be localized by YARN. The user can specify a different version of Kafka, which can be a customized package. Parameters such as server heap size are defined within the same configuration and made available to the KOYA agent script, which provides them as broker server properties. A Kafka cluster can be created by configuring and launching the package through Slider without per node installation or other manual steps.
  • Support for YARN node labels to pin components to specific set of machines: Allows Kafka servers to be restricted to dedicated machines, to guarantee local disk I/O for optimal performance. With YARN-2139 direct support for disk as resource is on the horizon.
  • REST API to access component status and deploy information: This provides a central status view for the Kafka cluster. This type of capability is important to operationalize Kafka. Future Slider version should allow for expanded status information.

KOYA requires Hadoop 2.6 and is designed to work on all Hadoop distros with that version. It supports installation with embedded Slider or as Slider application package that can be added to an existing Slider install. KOYA consists of Python scripts for the agent and configuration files. Creating a Slider application package is straightforward, especially once the agent API is better documented and more flexibility added for component instance configuration.

What’s Next?

KOYA is under development as open source and we are looking to take it forward in collaboration with Kafka and YARN communities. We are targeting Q2 for the first release. One of our objectives is to provide a dedicated admin web service for the Kafka cluster. We see this as future part of Kafka that should be integrated as a Slider component and plan to work with the Kafka community on it. We also identified a number of enhancements to Slider that we are looking forward to incorporate with future releases.

Learn More

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *