Building YARN Apps for Hadoop

Develop with YARN as the Data Operating System for Enterprise Hadoop

YARN is the data operating system of Hadoop that enables you to process data simultaneously in multiple ways. YARN provides provides the resource management and pluggable architecture to enable a wide variety of data access methods to operate on data stored in Hadoop with predictable performance and service levels.

Develop with YARN Engines

Engines such as Apache Tez and Apache Slider provide powerful frameworks to rapidly integrate 3rd party processing and services. YARN APIs can be used natively for complete control where needed. As a developer you can choose the option that suits your need.


Apache Tez

Apache™ Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks. By eliminating unnecessary tasks, synchronization barriers, and reads from and write to HDFS, Tez speeds up data processing across both small-scale, low-latency and large-scale, high-throughput workloads. More about Tez »

Apache Slider

Apache™ Slider is an engine that runs other applications in a YARN environment. With Slider, distributed applications that aren’t YARN-aware can now participate in the YARN ecosystem – usually with no code modification. Slider allows applications to use Hadoop’s data and processing resources, as well as the security, governance, and operations capabilities of enterprise Hadoop.

Data processing engines such as Apache Hive, HBase and Storm already take advantage of the available YARN APIs and Engines making those engines more powerful and versatile than ever before.

Applications integrating with Slider and Tez are eligible for certification in the YARN Ready program.

Develop with YARN APIs

YARN has become the data operating system for Hadoop and is the architectural center for development of Hadoop-based applications. The resources below can help you understand the YARN-based architecture of Hadoop 2 and how to build apps that can take full advantage of the possibilities.

Get an overview of Apache Hadoop YARN concepts in this slide deck.

STEP 1. Understand the motivations and architecture for YARN.


Building Apps

STEP 2. Explore example applications on YARN.

The simple applications in this section show how to build and deploy apps against the YARN APIs and are a simple way to get started. These apps can be easily replicated in the Hortonworks Sandbox VM environment.

  • Simple YARN App. This ‘Hello World’ app for YARN runs n copies of a unix command.
  • Distributed Shell. This fuller example implements a distributed shell on YARN.
  • MemcacheD on YARN. A tutorial showing how to deploy the very popular MemcacheD framework on YARN.

STEP 3. Examine real world applications YARN.

These applications are richer applications built on YARN and demonstrate real-world use and deployment.

Further Resources

The following resources can also assist with developing Hadoop-based Apps on YARN.

Companies using YARN

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.