YARN, the Hadoop Operating System

Making Hadoop a True Multi-Use Data Platform

Apache Hadoop YARN is the data operating system for Hadoop 2. YARN enables a user to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture.

Initiative Goals

Enable data processing models beyond MapReduce (batch) such as interactive, streaming and search.
Double processing IN Hadoop on the same hardware while providing predictable performance and quality of service.
Resource Sharing
Provide a stable, common set of shared resources across multiple, co-ordinated workloads IN Hadoop.

Status: Delivered

The genesis of YARN and Hadoop 2 dates back to a Jira ticket (MAPREDUCE-279) raised in January 2008 by Hortonworks co-founder Arun Murthy. YARN is the result of 5 years of subsequent development in the open community.

From Batch to YARN based Hadoop

YARN has been tested by Yahoo! since September 2012 and has been in production across 30,000 nodes and 325PB of data since January 2013. More recently, other enterprises such as Microsoft, eBay, Twitter, XING and Spotify have adopted a YARN-based architecture.


By separating the original processing engine of Hadoop (MapReduce) from the resource management, YARN is the operating system for Hadoop. This means that many different processing engines can operate simultaneously across a Hadoop cluster, on the same data, at the same time.


Efficiency & Shared Resources

YARN’s dynamic resource allocation doubled Hadoop’s processing power, while providing the same predictable performance and quality of service. YARN provides a fabric of stable, shared resources across multiple co-ordinated workloads:

  • Batch – MapReduce
  • Script – Pig
  • Interactive SQL – Hive, Tez and ORCFile
  • NoSQL – HBase and Accumulo
  • Stream – Storm
  • Search – Solr
  • Preview of In-Memory – Spark
  • And More to Come…

Essential Timeline

YARN : Initiation
  • MAPREDUCE-279 (Jan 2008)
  • Merged to trunk (Oct 2011)
Delivered to Trunk (Oct 2011)
YARN : Stabilization
  • YARN in UAT (Yahoo!)
  • YARN in Production (Yahoo!)
  • Hadoop 2 Beta (Aug 2013)
Aug 2013
YARN : Implementation
  • YARN/Hadoop 2.0 GA
  • HDP 2.0
Hadoop 2 GA (Oct 2013)

Technical Resources


Concepts on YARN:

Technical Information:


Join the Webinar!

Secure Analytics in the Modern Data Architecture – with Voltage Security
Tuesday, March 31, 2015
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Eliminating the Challenges of Big Data Management Inside Hadoop
Tuesday, April 7, 2015
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Try these Tutorials

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.