HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services.
In this release, we also introduced a new framework Apache™ Slider for easy on boarding of Long-Running service on top of YARN. This framework enables Long-Running applications or services to be deployed in a YARN environment. By adopting Slider, distributed Long-Running applications that aren’t YARN-aware can now participate in the YARN ecosystem – with no code modification. We also released Hadoop based NoSQL databases – Apache HBase, Apache Accumulo and stream processing system Apache Storm on top of Slider. There are more applications getting ported on top of Slider to take advantage of YARN integration.
Applications can integrate with YARN natively for complete control where needed or use the Slider framework on top of YARN, to provide rapid integration and additional capabilities in a low-cost and future-proof way.
As part of HDP 2.2, we are bringing native support for Long-Running Services on existing Hadoop YARN deployments. Deploying Long-Running services on YARN is fundamentally not so different from deploying short-lived applications except for a few differences.
Steps to integrate a basic application with YARN natively:
Native Integration using YARN API enables:
To integrate an enterprise-ready application, a few more capabilities need to be covered in the Client and Application Master module of the application:
To handle these aspects of enterprise-ready application, there is a learning curve for the application developer that includes a testing effort and maintenance against future Hadoop releases.
As part of HDP 2.2, we are also introducing Slider, a framework to make it easy to deploy and manage existing applications in a Hadoop cluster on YARN.
Slider manages applications by launching a YARN Application Master for every application instance and agents in every resource container allocated by YARN based on application resource requirements. After the launch Application Master can allocate or de-allocate resources, stop/start application instances based on application administrator’s request through Slider client or YARN’s resource scheduling pre-emptions or through Ambari integration.
Integration of Long-Running services via Slider provides the following benefits without any additional code:
Slider views any application as a set of components and each component is a daemon or executable with its own configuration and scripts, data files, etc. Components may have one or more instances. Slider manages application instances by managing component instances.
Any application to be integrated has the following steps for Slider Integration:
Apart from enabling the application to integrate with YARN, the Slider framework provides many other features that are critical for any Long-Running service on top of YARN:
YARN provides multiple ways to integrate Long-Running services with it. Native YARN API based integration is ideal for large-scale distributed algorithms like Map-Reduce or Long-Running services with specific placement and scheduling needs. Any other Long-Running services or applications, a framework like Slider should be considered for ease of integration and other value-added features it offers.