YARN, the Hadoop Operating System

Making Hadoop a True Multi-Use Data Platform

Apache Hadoop YARN is the data operating system for Hadoop 2. YARN enables a user to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture.

Initiative Goals

Flexibility
Enable data processing models beyond MapReduce (batch) such as interactive, streaming and search.
Efficiency
Double the processing IN Hadoop on the same hardware while providing predictable performance and quality of service.
Resource Sharing
Provide a stable, common set of shared resources across multiple, co-ordinated workloads IN Hadoop.

Status

The genesis of YARN and Hadoop 2 is via a Jira ticket (MAPREDUCE-279) raised in January 2008 by Hortonworks co-founder Arun Murthy. YARN is the result of 5 years of development and forms part of Hadoop as Apache Hadoop YARN.

yarnshift

YARN has been tested by Yahoo! since September 2012 and has been in production across 30,000 nodes and 325PB of data since January 2013. More recently, other enterprises such as Microsoft, eBay, Twitter, XING and Spotify have adopted a YARN-based architecture.

Flexibility

By separating the original processing engine of Hadoop (MapReduce) from the resource management, YARN is the operating system for Hadoop. This means that many different processing engines can operate simultaneously across a Hadoop cluster, on the same data, at the same time.

yarnflexible

Efficiency & Shared Resources

YARN’s dynamic resource allocation doubled Hadoop’s processing power, while providing the same predictable performance and quality of service. YARN provides a fabric of stable, shared resources across multiple co-ordinated workloads:

  • Batch – MapReduce
  • Script – Pig
  • Interactive SQL – Hive, Tez and HCatalog
  • NoSQL – HBase and Accumulo
  • Stream – Storm
  • Search
  • And More to Come…

Essential Timeline

YARN : Initiation
  • MAPREDUCE-279 (Jan 2008)
  • Merged to trunk (Oct 2011)
Delivered to Trunk (Oct 2011)
YARN : Stabilization
  • YARN in UAT (Yahoo!)
  • YARN in Production (Yahoo!)
  • Hadoop 2 Beta (Aug 2013)
Delivered
Aug 2013
YARN : Implementation
  • YARN/Hadoop 2.0 GA
  • HDP 2.0
Delivered
Hadoop 2 GA (Oct 2013)

Technical Resources

Background:

Concepts on YARN:

Technical Information:

Resources

Recently in the Blog

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.

Thank you for subscribing!