Philosophy behind YARN Resource Management

YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.

YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that an application can optimize compute for their specific requirements.

The YARN Resource Manager service is the central controlling authority for resource management and makes allocation decisions. It exposes a Scheduler API that is specifically designed to negotiate resources and not schedule tasks. Applications can request resources at different layers of the cluster topology such as nodes, racks etc. The scheduler determines how much and where to allocate based on resource availability and the configured sharing policy.

Currently, there are two sharing policies – fair scheduling and capacity scheduling. Thus, the API reflects the Resource Manager’s role as the resource allocator. This API design is also crucial for Resource Manager scalability because it limits the complexity of the operations to the size of the cluster and not the size of the tasks running on the cluster.The actual task scheduling decisions are delegated to the application manager that runs the application logic. It decides when, where and how many tasks to run within the resources allocated to it. It has the flexibility to choose its locality, co-scheduling, co-location and other scheduling strategies.


Screen Shot 2013-02-22 at 7.31.53 AM


Fundamentally, YARN resource scheduling is a 2-step framework with resource allocation done by YARN and task scheduling done by the application. This allows YARN to be a generic compute platform while still allowing flexibility of scheduling strategies. An analogy would be general purpose operating systems that allocate computer resources among concurrent processes.

We envision YARN to be the cluster operating system. It may be the case that this 2-step approach is slower than a custom scheduling logic but we believe that such problems can be alleviated by careful design and engineering. Having the custom scheduling logic reside inside the application allows the application to be run on any YARN cluster. This is important for creating a vibrant YARN application ecosystem (tez is a good example of this) that can be easily deployed on any YARN cluster. Developing YARN scheduling libraries will alleviate the developer effort needed to create application specific schedulers and YARN-103 is a step in that direction.

Categorized by :
Other YARN


April 26, 2013 at 8:48 am

quotation: “The actual task scheduling decisions are delegated to the application manager that runs the application logic.” Is the ‘application manager’ == ‘application master’ ? Sorry for this stupid question, but I am a bit confused of the used terminology…

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.