YARN, the Hadoop Operating System
Apache Hadoop YARN is the data operating system for Hadoop 2.0. YARN enables a user to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture.
The genesis of YARN and Hadoop 2 is via a Jira ticket (MAPREDUCE-279) raised in January 2008 by Hortonworks co-founder Arun Murthy. YARN is the result of 5 years of development and forms part of Hadoop as Apache Hadoop YARN.
YARN has been tested by Yahoo! since September 2012 and has been in production across 30,000 nodes and 325PB of data since January 2013. More recently, other enterprises such as Microsoft, EBay, Twitter and Xing have adopted a YARN-based architecture.
By separating the original processing engine of Hadoop (MapReduce) from the resource management, then YARN is effectively an operating system for Hadoop. This means that many different processing engines can operate simultaneously across a Hadoop cluster.
Efficiency & Shared
YARN has been shown to enable double the processing in Hadoop on the same hardware providing predictable performance and quality of service. YARN provides a fabric of stable, shared resources across multiple co-ordinated workloads.
- Management and Monitoring.
- High Availability.
- Disaster Recovery.
- MAPREDUCE-279 (Jan 2008)
- Merged to trunk (Oct 2011)
- YARN in UAT (Yahoo!)
- YARN in Production (Yahoo!)
- Hadoop 2 Beta (Aug 2013)
- YARN/Hadoop 2.0 GA
- HDP 2.0
Hadoop 2 GA (Oct 2013)
Concepts on YARN:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
- Stabilizing YARN APIs for Apache Hadoop 2
- Management of Application Dependencies
- Resource Localization in YARN: Deep Dive