Meet the Committer: 3 Minutes on Apache Hadoop YARN with Arun Murthy

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week – as Hadoop 2 goes GAArun Murthy discusses his journey with Hadoop. The journey has taken Arun from developing Hadoop, to founding Hortonworks, to this week’s release of Hadoop 2, with its Yarn-based architecture.

Arun describes the difference between MapReduce and YARN, and how they are related in Hadoop 2 (and by extension in Hortonworks Data Platform v2).

YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch.

MapReduce used to do two things at once: data processing and resource management. Now YARN does resource management, and MapReduce is just another application that runs natively in Hadoop. With the launch of Hadoop 2, YARN is the Hadoop operating system.

Now other applications can run simultaneously IN Hadoop as peers to MapReduce:

  • Tez can do interactive query
  • Storm can handle streaming data
  • Giraph can handle graphs
  • And so on…

Learn more about YARN here or at the Apache Hadoop project site

Categorized by :
Hadoop Ecosystem Hortonworks People YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.