This post from Vinod Kumar Vavilapalli of Hortonworks and Chris Douglas and Carlo Curino of Microsoft Research.
Great news from the Apache Hadoop YARN community! A paper describing Apache Hadoop YARN was accepted at 2013 ACM Symposium on Cloud Computing (SoCC 2013), where it won the award for best paper! Here’s the title and abstract:
Apache Hadoop YARN: Yet Another Resource Negotiator [Industrial Paper]
The initial design of Apache Hadoop was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá—the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs’ control flow, which resulted in endless scalability concerns for the scheduler.
In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop’s compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.
You can access the full paper here.
We are proud of this award. With the Apache Hadoop 2 GA release right around the corner, recognition of its potential validates all the hard work that’s gone into the YARN project. We are equally humbled by the challenges still ahead of us, as we work to deliver on the promise of this platform. We hope this paper can open YARN to new audiences of developers and researchers; we welcome them to our community.
As you can see from the author list and the acknowledgements in the paper, this gigantic effort wouldn’t be possible without the extraordinary work of so many. YARN has been – and continues to be – a completely community driven project. Our congratulations and thanks to everyone who contributed to YARN.
The full list of papers accepted into SoCC 2013 is here.