Building Hadoop-based Apps on YARN

Take advantage of Hadoop 2 and YARN as the Data Operating System

Apache Hadoop YARN changes the game for Hadoop applications, enabling a multi-application, multi-workload general purpose data operating system. YARN is:

  1. Flexible

    Store data once and interact with it in multiple ways from batch to interactive to real time and streaming.

    Architected to enable new workloads.

  2. Shared

    Re-use key platform services for reliability, redundancy and security across multiple workloads.

    Multi-tenant architecture shares core resources while isolating services and data.

  3. Efficient

    Do more with less: 30%+ increased efficiency on existing resource utilization.

    Share and segment applications based on cluster resource management.

This set of resources is intended to get you up and running developing apps for YARN.

STEP 1. Understand the motivations and architecture for YARN.

Apache Hadoop YARN is the data operating system for Hadoop 2.0. YARN enables a user to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture. Find out more about the concepts and specifics of YARN.


Get an overview of Apache Hadoop YARN concepts in this slide deck.

Concepts

Building Apps

STEP 2. Explore example applications on YARN.

The simple applications in this section show how to build and deploy apps against the YARN APIs and are a simple way to get started. These apps can be easily replicated in the Hortonworks Sandbox VM environment.

  • Simple YARN App. This ‘Hello World’ app for YARN runs n copies of a unix command.
  • Distributed Shell. This fuller example implements a distributed shell on YARN.
  • MemcacheD on YARN. A tutorial showing how to deploy the very popular MemcacheD framework on YARN.

STEP 3. Examine real world applications YARN.

These applications are richer applications built on YARN and demonstrate real-world use and deployment.

FURTHER RESOURCES

The following resources can also assist with developing Hadoop-based Apps on YARN.

TRAINING

Hortonworks also provides training and certification for Hadoop.

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.

Thank you for subscribing!