YARN, the Hadoop Operating System

Making Hadoop a True Multi-Use Data Platform

Apache Hadoop YARN is the data operating system for Hadoop 2. YARN enables a user to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture.

Initiative Goals

Flexibility
Enable data processing models beyond MapReduce (batch) such as interactive, streaming and search.
Efficiency
Double processing IN Hadoop on the same hardware while providing predictable performance and quality of service.
Resource Sharing
Provide a stable, common set of shared resources across multiple, co-ordinated workloads IN Hadoop.

Status: Delivered

The genesis of YARN and Hadoop 2 dates back to a Jira ticket (MAPREDUCE-279) raised in January 2008 by Hortonworks co-founder Arun Murthy. YARN is the result of 5 years of subsequent development in the open community.

From Batch to YARN based Hadoop

YARN has been tested by Yahoo! since September 2012 and has been in production across 30,000 nodes and 325PB of data since January 2013. More recently, other enterprises such as Microsoft, eBay, Twitter, XING and Spotify have adopted a YARN-based architecture.

Flexibility

By separating the original processing engine of Hadoop (MapReduce) from the resource management, YARN is the operating system for Hadoop. This means that many different processing engines can operate simultaneously across a Hadoop cluster, on the same data, at the same time.

YARN

Efficiency & Shared Resources

YARN’s dynamic resource allocation doubled Hadoop’s processing power, while providing the same predictable performance and quality of service. YARN provides a fabric of stable, shared resources across multiple co-ordinated workloads:

  • Batch – MapReduce
  • Script – Pig
  • Interactive SQL – Hive, Tez and ORCFile
  • NoSQL – HBase and Accumulo
  • Stream – Storm
  • Search – Solr
  • Preview of In-Memory – Spark
  • And More to Come…

Essential Timeline

YARN : Initiation
  • MAPREDUCE-279 (Jan 2008)
  • Merged to trunk (Oct 2011)
Delivered to Trunk (Oct 2011)
YARN : Stabilization
  • YARN in UAT (Yahoo!)
  • YARN in Production (Yahoo!)
  • Hadoop 2 Beta (Aug 2013)
Delivered
Aug 2013
YARN : Implementation
  • YARN/Hadoop 2.0 GA
  • HDP 2.0
Delivered
Hadoop 2 GA (Oct 2013)

Technical Resources

Background:

Concepts on YARN:

Technical Information:

Resources

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.