Building YARN Apps for Hadoop
YARN is the data operating system of Hadoop that enables you to process data simultaneously in multiple ways. YARN provides provides the resource management and pluggable architecture to enable a wide variety of data access methods to operate on data stored in Hadoop with predictable performance and service levels.
Develop with YARN Engines
Engines such as Apache Tez and Apache Slider provide powerful frameworks to rapidly integrate 3rd party processing and services. YARN APIs can be used natively for complete control where needed. As a developer you can choose the option that suits your need.
Apache™ Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks. By eliminating unnecessary tasks, synchronization barriers, and reads from and write to HDFS, Tez speeds up data processing across both small-scale, low-latency and large-scale, high-throughput workloads. More about Tez »
Apache™ Slider is an engine that runs other applications in a YARN environment. With Slider, distributed applications that aren’t YARN-aware can now participate in the YARN ecosystem – usually with no code modification. Slider allows applications to use Hadoop’s data and processing resources, as well as the security, governance, and operations capabilities of enterprise Hadoop.
Applications integrating with Slider and Tez are eligible for certification in the YARN Ready program.
Develop with YARN APIs
YARN has become the data operating system for Hadoop and is the architectural center for development of Hadoop-based applications. The resources below can help you understand the YARN-based architecture of Hadoop 2 and how to build apps that can take full advantage of the possibilities.
STEP 1. Understand the motivations and architecture for YARN.
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
- Stabilizing YARN APIs for Apache Hadoop 2
- Management of Application Dependencies
- Resource Localization in YARN: Deep Dive
- Simplifying user-logs management and access in YARN
STEP 2. Explore example applications on YARN.
The simple applications in this section show how to build and deploy apps against the YARN APIs and are a simple way to get started. These apps can be easily replicated in the Hortonworks Sandbox VM environment.
Simple YARN App. This ‘Hello World’ app for YARN runs n copies of a unix command.
Distributed Shell. This fuller example implements a distributed shell on YARN.
- MemcacheD on YARN. A tutorial showing how to deploy the very popular MemcacheD framework on YARN.
STEP 3. Examine real world applications YARN.
These applications are richer applications built on YARN and demonstrate real-world use and deployment.
MapReduce on YARN The official codebase for Apache Hadoop MapReduce on YARN (MR2)
HBase on YARN. Efforts to deploy HBase on YARN.
The following resources can also assist with developing Hadoop-based Apps on YARN.