Category Archives: YARN


Moving Hadoop Beyond Batch with Apache YARN

Apache Hadoop 2.0 continues to make its way through the open source community process at the Apache Software Foundation and is getting closer to being declared “ready” from a community development perspective.  Once ready, our team at Hortonworks will apply our usual enterprise rigor in providing a tested and integrated distribution that includes Hadoop 2.0 along with the other enterprise-focused services our customers and partners require.

In my roles both at Hortonworks and in the open-source Apache Hadoop community, I’m asked a lot of questions regarding the key aspects and motivations behind Hadoop 2.0. Here is some information to sate the curious mind.

First-generation success inspires second-generation focus

In the early days of Hadoop at Yahoo!, we had a very particular objective: store and process very large amounts of data to support our internet search efforts.  And so the first generation of Hadoop was a purpose-built system for web-scale data processing that was embraced by Yahoo! as well as other technology-savvy early adopters such as Facebook.

As usage at Yahoo! began to expand so did the number of ways that users wanted to interact with the data stored in Hadoop. As with any successful open-source project, the broader ecosystem of Hadoop users responded by contributing additional capabilities to the Hadoop community, with some of the most popular examples being Apache Hive for SQL-based querying, Apache Pig for scripted data processing and Apache HBase as a NoSQL database.

These additional open source projects opened the door for a much richer set of applications to be built on top of Hadoop – but they didn’t really address the design limitations inherent in Hadoop; specifically, that it was designed as a single application system with MapReduce at the core (i.e. batch-oriented data processing).

Do we need SQL ON Hadoop or SQL IN Hadoop?

Fast forward to today, and we see that Hadoop’s momentum has continued and many more enterprises (not just web scale companies) want to store ALL incoming data in Hadoop, and then enable their users to interact with it in a host of different ways: batch, interactive, analyzing data streams as they arrive, and more.  And most importantly, they need to be able to do this all simultaneously without any single application or query consuming all of the resources of the cluster to do so.

Nothing illustrates this dynamic more clearly than the current industry noise around SQL on Hadoop.  All kinds of vendors are clamoring to provide better SQL access to data stored in Hadoop – and so they should, since SQL is understood by many users.  Since Apache Hive has been the defacto SQL interface to Hadoop data for many years, we’ve found most users would like to continue to leverage the power of Hive in support of these additional interactive SQL use cases.

But by building SQL access on top of Hadoop, it just highlights the challenge of Hadoop being a single application system.  For when I run a SQL query on that data, it could consume all the resources of the cluster and cause performance issues for the other applications and jobs running in the cluster – not a good outcome to say the least.

YARN enables SQL IN Hadoop and many more applications

When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0.  By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.

Getting back to the SQL ON Hadoop point, with YARN we now have the ability to run SQL IN Hadoop. For by being IN Hadoop (built on YARN), it becomes part of the platform itself and can be managed by YARN to ensure that multiple use cases can be addressed. Why stop at SQL? What about machine learning or modeling? What about processing events (data) as they arrive? Would it be not nice to manage all of these through a common system?

Enter YARN.

By turning Apache Hadoop 2.0 into a multi application data system, YARN enables the Hadoop community to address a generation of new requirements IN Hadoop. YARN responds to these enterprise challenges by addressing the actual requirements at a foundational level rather than being commercial bolt-ons that complicate the environment for customers.

And so that is the trailer for the story for Hadoop 2.0: Unleashing the Power of YARN. Coming soon to a cluster near you, summer of 2013! Stay tuned!

Understanding Hadoop 2.0

In this post, we’ll explain the difference between Hadoop 1.0 and 2.0. After all, what is Hadoop 2.0? What is YARN?

For starters – what is Hadoop and what is 1.0? The Apache Hadoop project is the core of an entire ecosystem of projects. It consists of four modules (see here):

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Hadoop 1.0 is based on the Hadoop .20.205 branch (it went 0.18 -> 0.19 -> 0.20 -> 0.20.2 -> 0.20.205 -> 1.0). Hard to follow? Check out this chart. Not hard for an open source developer, but obscure for an enterprise product – so everyone agreed to call 0.20.205 ’1.0′, the project having matured to that point.

Hadoop 2.0 is from the Hadoop 0.23 branch, with major components re-written to enable support for features like High Availability, and MapReduce 2.0 (YARN), and to enable Hadoop to scale out past 4,000 machines per cluster. Specifically, Hadoop 2.0 adds (see here):

  • HDFS Federation – multiple, redundant namenodes acting in congress
  • MapReduce NextGen aka YARN aka MRv2 – which transforms Hadoop into a full blown platform as a service. See here.

Hadoop 1.0 is rock-solid, Hadoop 2.0 is still in active development and is considered Alpha. Work continues to stabilize Hadoop 2.0, so stay tuned!

Microsoft’s Contributions to the Stinger Initiative and Apache Hive

Guest blog post from Eric Hanson, Principal Program Manager, Microsoft

Hadoop had a crazy and collaborative beginning as an OSS project, and that legacy continues. There have been over 1,200 contributors across 80 companies since its beginning. Microsoft has been contributing to Hadoop since October 2011, and we’re committed to giving back and keeping it open.

Our first wave of contributions, in collaboration with Hortonworks, has been to port Hadoop to Windows, to enable it both for our HDInsight service on Windows Azure and for on-premises Big Data installations on Windows. Now, we’re starting to contribute to the Stinger initiative to dramatically speed up Hive and make it more enterprise-ready.

Contribution to the core of Apache Hadoop through Stinger

Our main activity in Stinger right now is around Tez, and vectorized query execution. One of our developers, Mike Liddell, has experience with DAG-based computations in Microsoft’s internal Dryad-LINQ effort, and has just joined Tez as a founding committer. I kick-started and helped guide our project to introduce columnstore data formats and vectorized (a.k.a. “batch mode”) query execution into SQL Server 2012.  After moving to the SQL Server Big Data team, I’ve been collaborating with Hortonworks developers since late last fall regarding how to make Hive faster. We heard about the ORC project, led by Owen O’Malley of Hortonworks, to improve the RCFile columnstore format. I’ve had several productive design discussions with Owen about ORC, and we really like the way it’s shaping up.

Based on our experience, we knew that a great columnstore format is only part of the story about making data warehouse-style queries run really fast. Good process and communication architecture is one – Tez is a great step there. Another is fast query execution (QE), and vectorized query execution research and field experience has shown it can speed up queries on the order of 10X-100X.

Some people were saying that fast QE required a total-rewrite in C++. I didn’t buy that, and I prototyped vectorized scan and filter operators in Java and shared this with Hortonworks. For simple conditions like column = constant, we’ve seen the ability to filter about 150 million rows per second on one thread in Java. We now have a two-company team introducing vectorized QE to Hive, consisting of two Hortonworks folks (Jitendra Pandey and Owen) and several Microsoft engineers. We’re going to take it in small steps, adding vectorized scans over ORC, and basic filter operations first. Then we’ll move on to vectorized aggregates and joins.

We think that the functional surface area of Hive, including its SQL query language, the open, extensible storage model over HDFS, and its easy programmer extensibility with Java UDFs, is quite compelling. It gives non-procedural access to Big Data, with ability for programmers to create custom Java add-ins that let them do complex calculations more easily that they can with Map-Reduce programs. Hive also has a strong community of OSS developers and users. It works on ultra-scale clusters on data sets vastly bigger than total cluster memory. Stinger aims to boost the speed of Hive to complement its rich functionality in a way that users will love.

An active participant in the open community

We’ve been part of OSS Big Data world for about a year and half now. Through the combined efforts of the overall Hadoop community, Microsoft, and Hortonworks, Hadoop is now accessible on Windows Server and Windows Azure. We’ve gained so much from the community. Now we’re helping return the favor by contributing to Stinger, with our eye on 100X performance gains.

Hortonworks Data Platform 2.0 Alpha 2 now available: focus on performance

We are very pleased to announce the Alpha 2 release of the Hortonworks Data Platform 2.0 (HDP 2.0 Alpha2) is now available for download!

A key focus in HDP 2.0 Alpha 2 is on performance as announced in the Stinger initiative, and includes a series of enhancements to the performance of Apache Hive for interactive SQL queries.  In fact HDP 2.0 Alpha 2 was used to perform the tests announced yesterday, showing a 45X performance increase using Hive.  There is much more to come but we are pleased with the early results, and encourage Hive users to take a look and continue to give us feedback.

Consistent with HDP 2.0 Alpha 1, this version is built from the developmental Apache Hadoop 2.0 line and includes Apache YARN, a next-generation resource-management and application framework that enables Hadoop to support an ever-expanding range of use cases.  We are extremely excited about the opportunities that YARN enables – for background, check out Arun Murthy’s blog post series where he provides a YARN overview.

Notable new components over Alpha 1 include:

  • Apache Tez: A new Apache project that provides an optimized data processing framework on top of YARN. Tez is a general-purpose, highly customizable framework that simplifies data processing tasks across both small-scale, low-latency and large-scale, high-throughput workloads in Hadoop. Tez can provide an order of magnitude performance boost for the broader ecosystem of data processing tools such as Apache Hive and Apache Pig.
  • Apache Hive Interactive Query: Beyond the speedups made possible by Apache Tez, several new features were added to speed Hive queries. A new file format called the ORCFile (optimized RC file) optimizes how data is stored and accessed in Hive, and significant query optimizations reduce latency and improve performance.

Note that Tez is not enabled by default.  Instructions for doing so, and allowing Hive to use Tez, are in the installation guide.

Learn More
Please take a look at the Hortonworks Documentation to learn more about installing and using HDP 2.0 Alpha 2.

Download It
You can download HDP 2.0 Alpha 2 from the Hortonworks Download site.

Tell Us About It
Please visit the HDP 2.0 Alpha Forum to ask questions, get help, provide feedback and hear what others are doing with HDP. 

We are excited about the opportunities that Hadoop 2 provides for the future of Hadoop and large-scale data processing. HDP 2.0 Alpha 2 is a key milestone that provides organizations with a packaged release to evaluate and gain experience with the upcoming Apache Hadoop 2 technology stack. We look forward to your feedback on HDP 2.0 Alpha 2 while we work with the community to make Hadoop 2 a stable reality. Enjoy!

Note: This Alpha release is a technology preview to gather feedback from outside of Hortonworks. Some features are missing or incomplete. Some APIs may change. Do not use Alpha 2 for production use. Keep away from open flame. Support is only available via Forums.

Apache Hadoop YARN Meetup II @Hortonworks

Introduction

Hortonworks hosted the second Apache Hadoop YARN meetup at Hortonworks office in Palo Alto on last Friday (22 February 2013). Following the success with the first one, this meetup continues to enjoy a good attendance from the YARN community. About 40 joined the meetup in person and nearly another 30 attended via phone/webex.

Meetup sessions

Update from Yahoo!

The Yahoo! grid team responsible for YARN rollout on their clusters gave an update of the current deployments and their state. Robert Evans and others from their team threw some very impressive numbers about the YARN clusters – 10s of million jobs till now on YARN, averaging ~100,000 jobs on some clusters per day. Please go ahead and read their recent blog on Yahoo! developer network: Hadoop at Yahoo!: More Than Ever Before. They then fielded several questions from the community like any pain-points for the users during the upgrade, big issues that only surfaced at scale. The software is deemed sufficiently stable, churning jobs out impressively and with maximum uptime with downtime mostly happening during upgrades.

Bikas Saha on ResourceManager restart functionality

After the update from Yahoo!, Bikas Saha from Hortonworks talked about the ResoureManager restart functionality. Most of his work is captured on the Apache JIRA issue YARN-128. The effort is divided into phases and the first phase involves:

  • Putting in place mechanisms to save application state and read them back after restart.
  • Upon restart, the NM’s are asked to reboot and the previously running AM’s are restarted.
  • AMs which support recovery automatically read back their own saved state on restart and try to resume work from before RM restart.

This first phase to restart all the running applications on RM recovery is done and shipped with the latest hadoop release 2.0.3-alpha. He discussed the overall design on a whiteboard, explaining the implementation.

YARN at LinkedIn

Chris Riccomini then talked a bit about what he continues to do with YARN (see his notes from last meetup).

Arun C Murthy on CPU scheduling in YARN

Arun talked about the enhancements to YARN resource scheduling to also account for CPU cores in addition to the memory based scheduling we already have. This effort is capture on Apache JIRA at YARN-2. Arun walked us through the DRF algorithm on which this work is based on, described various scheduling scenarios and summed it up with possible future directions.

Alejandro also gave a brief summary about adding support for CPU isolation/monitoring of containers. YARN-3 enhances YARN to use cgroups to control the cpu usage of containers. There is still a little work left to make this feature exposed to the end-users.

CPU scheduling and support for isolation via cgroups in YARN are both available in the most recent hadoop release 2.0.3-alpha. Both these features are big steps for YARN in realizing its goal of becoming the foremost generic resource management layer and making Hadoop the ‘distributed operating system’ on which rest of the data systems build on.

YARN progress and Roadmap

I did a quick recap of what we discussed in the last YARN meetup and what we’ve achieved so far. Few things, the community has delivered on its promises from last time:

Libraries for helping application writers: YARN-418 is the umbrella ticket for tracking this and we made quite some progress. YARN-29 helps application submission, YARN-103 is helpful to simply the usage of the AM RM protocol.
CPU isolation and scheduling: YARN-2 and YARN-3 are checked in as noted above
RM restart: The first phase to just restart running AMs and NMs is already in as part of YARN-128.

I then summed it up with our roadmap. The goal of YARN community for the next version of hadoop is to address some rough corners in YARN that are thwarting its march beyond its alpha use. Some areas of focus include:

  • RM restart: Finish testing RM restart at scale and progress toward the next phases
  • YARN usability: Address the minor and major usability issues with YARN client and web interfaces.
  • YARN API cleanup: Cleanup YARN APIs now itself to make them future proof and help us support backwards compatibility of stable APIs into the future
  • Security: YARN’s mostly been secure from the word get-go, a couple of minor things need to addressed to close this loop.

Conclusion

Thanks to everyone for making YARN meetups a continued success story. All help is welcome from the community to focus on solidifying our next release. Looking forward to meeting you all again at the next meetup!

The Fastest Path to Innovation: Community Driven Open Source

 

blogpicLast week, we outlined our approach for delivering an enterprise viable Apache Hadoop distribution in the open.  Simply put: we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.

In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.  These efforts focus on enterprise requirements that are essential to enable broad adoption across the Hadoop ecosystem.

  • The Stinger initiative aims to dramatically speed up Apache Hive in support of interactive query use cases.
  • The Knox Gateway addresses the need for a single point of authentication and secure access for Apache Hadoop services in a cluster.
  • The Tez framework provides an alternative next-generation runtime built on Hadoop YARN that significantly improves latency and throughput of Hadoop applications.

We feel these efforts are strong examples of our commitment to driving innovation from within the open source community, and as stated in our approach blog, we do this by::

  • identifying and articulating the enterprise requirements within the community,
  • taking an active role in addressing those requirements within the community, and
  • applying enterprise rigor to the build, test and release process to ensure that the open source projects as well as the larger product distribution we provide is enterprise grade and interoperable with other elements in the enterprise.

Since it takes a community to build enterprise-class platforms like Hadoop, if you have interest in helping with Knox, Tez, or Stinger, we encourage you to work with us and the others in the Apache community!

Introducing… Tez: Accelerating processing of data stored in HDFS

 

MapReduce has served us well.  For years it has been THE processing engine for Hadoop and has been the backbone upon which a huge amount of value has been created.  While it is here to stay, new paradigms are also needed in order to enable Hadoop to serve an even greater number of usage patterns.  A key and emerging example is the need for interactive query, which today is challenged by the batch-oriented nature of MapReduce.  A key step to enabling this new world was Apache YARN and today the community proposes the next step… Tez

What is Tez?

Tez – Hindi for “speed” – (currently under incubation vote within Apache) provides a general-purpose, highly customizable framework that creates simplifies data-processing tasks across both small scale (low-latency) and large-scale (high throughput) workloads in Hadoop. It generalizes the MapReduce paradigm to a more powerful framework by providing the ability to execute a complex DAG (directed acyclic graph) of tasks for a single job so that projects in the Apache Hadoop ecosystem such as Apache Hive, Apache Pig and Cascading can meet requirements for human-interactive response times and extreme throughput at petabyte scale (clearly MapReduce has been a key driver in achieving this).

With the emergence of Apache Hadoop YARN as the basis of next generation data-processing architectures, there is a strong need for an application which can execute a complex DAG of tasks which can then be shared by Apache Pig, Apache Hive, Cascading and others.  The constrained DAG expressible in MapReduce (one set of maps followed by one set of reduces) often results in multiple MapReduce jobs which harm latency for short queries (overhead of launching multiple jobs) and throughput for large-scale queries (too much overhead for materializing intermediate job outputs to the filesystem). With Tez, we introduce a more expressive DAG of tasks, within a single application or job, that is better aligned with the required processing task – thus, for e.g., any given SQL query can be expressed as a single job using Tez.

The below graphic illustrates the advantages provided by Tez for complex SQL queries in Apache Hive or complex Apache Pig scripts.

pighivetez

Tez is critical to the Stinger Initiative and goes a long way in helping Hive support both interactive queries and batch queries. Tez provides a single underlying framework to support both latency and throughput sensitive applications, there-by obviating the need for multiple frameworks and systems to be installed, maintained and supported, a key advantage to enterprises looking to rationalize their data architectures. .

Essentially, Tez is the logical next step for Apache Hadoop after Apache Hadoop YARN. With YARN the community generalized Hadoop MapReduce to provide a general-purpose resource management framework (YARN) where-in MapReduce became merely one of the applications that could process data in your Hadoop cluster. With Tez, we build on YARN and our experience with the MapReduce to provide a more general data-processing application to the benefit of the entire ecosystem i.e. Apache Hive, Apache Pig etc.

What has been completed? Where can Tez go?

An early version of the project has been donated to the ASF as part of the initial code grant to establish the Incubation project.   Through the work done in the Stinger initiative, it is already clear that Tez enables and order of magnitude increase in the performance of Apache Hive.

The community is also designing a re-usable set of libraries of data-processing primitives such as sorting, merging, data-shuffling, intermediate data management etc. which are necessary for Tez and may be used directly by other projects.  This is just the beginning.  It is an extensible architecture that will undoubtedly be contributed to widely.

For the community, by the community

At Hortonworks we believe that innovation happens fastest by working with a community of like-minded individuals to address the requirements for Hadoop without being bounded by artificial boundaries such as employment. As such, even though the Hortonworks MapReduce/Hive/Pig team seeded the project, we’ve had the benefit of both positive feedback and constructive criticism from several leading contributors and committers across the Apache Hadoop MapReduce, Apache Hive & Apache Pig projects.  These inventors and peers are employed at Hortonworks, Yahoo, Facebook, Microsoft, Twitter and many others.  The initial committer list has 22 participants with deep domain expertise in these unique challenges and comprises a who’s who in the Hadoop world.  Of course, now that we are nearly in a position where we can co-develop via the Apache Software Foundation where we have proposed Tez as an Incubator project, we expect a very quick acceleration of project development.

When will it be available?

We plan to donate the code from our internal repository to the ASF as part of the Incubator proposal.  Also, Hortonworks will ship Tez in the next alpha release of Hortonworks Data Platform 2 (HDP2), based on Apache Hadoop 2.0, very soon to showcase some of the very exciting advances we have made for Apache Hive via Project Stinger.

We are very excited by the reception Tez has received so far, and we do hope you can join us in this initiative via the Apache Software Foundation project to make Hadoop better!

Apache Hadoop YARN Meetup at Hortonworks – ReCap!

Introduction

The Apache Hadoop YARN meetup at Hortonworks on October 12, 2012 we previously announced was a resounding success. We had a very good turnout of around seventy people from the community.

Meetup sessions

Deployments at Yahoo!

The meetup kicked off with YARN committers from Yahoo presenting on current Hadoop 2.0 deployments at Yahoo. As part of the presentation, the following were covered.

  • described scenarios where YARN positively advanced the state of the art like scalability, its current stability, the power of the YARN web-services, and its superlative performance compared to the previous versions.
  • efforts undergone relation to battle testing YARN including application validation and performance benchmarking.
  • summed it up with suggestions for improvements to issues like UI loading, lack of generic history server etc.

Chris Riccomini’s on “Building Applications on YARN”


Chris Riccomini from LinkedIn then presented about his experience in “Building Applications on YARN”. He briefly covered the anatomy of a YARN application and then jumped into various dimensions a YARN application developer should think about – deployment, metrics, logging, application specific configuration to name a few.

The most interesting bits about his presentation include how, pre-production, small instances of YARN clusters can be utilized to develop applications in an agile manner. For example, one could start with using local file system and avoiding HDFS to minimize the operational effort, and then switch over to a full-blown distributed file system when the desire for scalability crosses a threshold. Also worth attention is how YARN’s web-service APIs can be exploited to build custom dashboards.

Chris posted his notes from the meetup and slides on his blog.

YARN API Discussion

After that, Arun recapped the YARN’s powerful scheduling API available to the application developers for using the cluster resources. He walked us through the scheduling concepts, and rounded it up with how scheduling happens in the context of an example MapReduce job.

Bikas and I then proceeded to give a brief overview of what all APIs are available to application developers. We described some of the pain points with the APIs that various users indicated in the recent past and efforts underway to address some of them. To enumerate a few:

  • How to make the scheduling logic explicit – for e.g, that scheduler looks for free resources on a node, then proceeds to a rack and then off-rack
  • Multiple ways to release and reject containers
  • Use-cases which require resources on specific nodes and/or racks
  • Applications that want to avoid/blacklist some nodes and/or racks
  • Limitations on the number of threads making resource requests

We opened the API discussion for further feedback. This exercise was very fulfilling. We discovered how various users were experimenting with the APIs and what pitfalls and limitations they ran into. Some concrete suggestions include:

  • Libraries for recovering AMs, launching containers
  • A generic framework for applications to expose specific data via http or web-services.
  • A generic application history server
  • Tagging nodes with labels like GPU etc and use these labels for scheduling. This is an extension of data locality

Our slides are available here.

Efforts Underway

After a short break, Alejandro Abdelnur from Cloudera briefly talked about the efforts underway to augment YARN with cpu-isolation using cgroups.

Finally, Siddarth Seth from Hortonworks talked about his work on modifying MR application to reuse containers for jobs both large and small. This exciting development opens new innovations in the MapReduce land like intermediate output aggregation. You can read through Sid’s presentation below. The core points covered are:

  • Decoupling the TaskAttempt and Container concepts inside MR AM
  • Add new first class concepts of Container, Node and Scheduler
  • The current state of the effort
  • New avenues this transition opens up – custom task types, output aggregation, performance optimizations.

His slides are available here.

Conclusion

The success of this meetup reaffirmed the excitement of the community about YARN. This also strengthened our desire to make it a recurring event. We look forward to the next one, with hopefully more turnout, extended brainstorming, and of course, more pizza and beer :)

Apache Hadoop 2.0.2-alpha Released!

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.0.2-alpha.

This is the second (alpha) release of the next generation release of Apache Hadoop 2.x and comes with significant enhancements to both the major components of Hadoop:

  • HDFS HA has undergone significant enhancements since the previous release for NameNode High Availability
  • YARN has undergone significant testing and stabilization and validation as is been heavily battle-tested since the previous release.

These are exciting times indeed for the Apache Hadoop community – personally, this is very reminiscent of the period in 2009 when we finally saw the light at the end of the tunnel during the stabilization of Apache Hadoop 1.x (then called Apache Hadoop 0.20.x). A déjà vu, if you will – albeit of the pleasant kind! Yes, we have a few miles to clock, but it feels like the hardest part is already behind us. At the time of release, YARN has already been deployed on super-sized clusters with 2,000 nodes and 3,600 nodes (totaling to nearly 6,000 nodes) at Yahoo alone*.

Going forward, I have no doubt that we are well of our way to sign-off on hadoop-2.x early next year and we are now heads down wrapping up the last of feature work since we have a reasonably stable base, such as:

  • HDFS HA without need for shared storage (already merged into Hadoop trunk sans a couple of design enhancements).
  • YARN ResourceManager availability.
  • YARN scheduling enhancements such as multi-resource scheduling (nearly complete, should be committed soon) and preemption.

Having said that, it’s critical for the developer community to get feedback on hadoop-2.x from the user community to ensure we continue to deliver great software – so, please, do go ahead, download the bits from the Apache Hadoop releases page, try the release and give us your valuable feedback – it’s a personal request! Of course, if you prefer a fully packaged and integrated stack you can browse to the Hortonworks Downloads page to try Hortonworks Data Platform 2.0 Alpha which integrates Hadoop 2.0.2-alpha with other important components such as Apache HBase, Apache Pig, Apache Hive, Apache HCatalog, Apache ZooKeeper and Apache Oozie

For more information about the HDP 2.0 alpha, you can check out our blog post from yesterday.

Acknowledgements
I’d like to thank everyone who has or continues to contribute to Apache Hadoop – everyone in the community. A special mention for Todd Lipcon for his contributions to HDFS HA and the Yahoo Hadoop team (Robert Evans, Thomas Graves, Daryn Sharp, Jason Lowe and everyone else) for their help in getting YARN to stability and large-scale deployments on their clusters.

*Yahoo is currently running hadoop-0.23.4 release which essentially is hadoop-2.0.2-alpha without HDFS high availability.

Hortonworks Data Platform 2.0 Alpha is Now Available for Preview!

We are very excited to announce the Alpha release of the Hortonworks Data Platform 2.0 (HDP 2.0 Alpha).

HDP 2.0 Alpha is built around Apache Hadoop 2.0, which improves availability of HDFS with High Availability for the NameNode along with several performance and reliability enhancements. Apache Hadoop 2.0 also significantly advances data processing in the Hadoop ecosystem with the introduction of YARN, a generic resource-management and application framework to support MapReduce and other paradigms such as real-time processing and graph processing.

In addition to Apache Hadoop 2.0, this release includes the essential Hadoop ecosystem projects such as Apache HBase, Apache Pig, Apache Hive, Apache HCatalog, Apache ZooKeeper and Apache Oozie to provide a fully integrated and verified Apache Hadoop 2.0 stack

Apache Hadoop 2.0 is well on the path to General Availability, and is already deployed at scale in several organizations; but it won’t get to the current maturity levels of the Hadoop 1.0 stack (available in Hortonworks Data Platform 1.x) without feedback and contributions from the community.

Hortonworks strongly believes that for open source technologies to mature and become widely adopted in the enterprise, you must balance innovation with stability. With HDP 2.0 Alpha, Hortonworks provides organizations an easy way to evaluate and gain experience with the Apache Hadoop 2.0 technology stack, and it presents the perfect opportunity to help bring stability to the platform and influence the future of the technology.

Learn More
Please take a look at the Hortonworks Documentation to learn more about installing and using HDP 2.0 Alpha.

To learn more about Apache Hadoop YARN, Arun Murthy — Chair of Apache Hadoop PMC and YARN/MapReduce lead – and the rest of Hortonworks YARN development team, have a great four-part Blog series on the technology: one, two, three and four.

Download It
You can download the HDP 2.0 Alpha bits from the Hortonworks Download site.

Tell Us About It
Please visit the HDP 2.0 Alpha Forum to ask questions, get help, provide feedback and hear what others are doing with HDP.

Note: This Alpha release is early access and not for production use. Support is only available via Forums. Additionally, this is an early access release, you might find some incomplete features or a bit of instability.

We are excited about the opportunities that Hadoop 2.0 provides for the future of Hadoop and Big Data. The HDP 2.0 Alpha release is just the beginning. Enjoy!

YARN Meetup at Hortonworks on Friday, Oct 12

Hortonworks is hosting an Apache YARN Meetup on Friday, Oct 12, to solicit feedback on the YARN APIs. We’ve talked about YARN before in a four-part series on YARN, parts one, two, three and four.

YARN, or “Apache Hadoop NextGen MapReduce,” has come a long way this year. It is now a full-fledged sub-project of Apache Hadoop and has already been deployed on a massive 2,000 node cluster at Yahoo. Many projects, both open-src and otherwise, are porting to work in YARN such as Storm, S4 and many of them are in fairly advanced stages. We also have several individuals implementing one-off or ad-hoc application on YARN.

This meetup is a good time for YARN developers to catch up and talk more about YARN, it’s current status and medium-term and long-term roadmap.

Agenda includes:

  • YARN committers from Yahoo will present on current YARN deployments at Yahoo, including lessons learned, stability, etc.
  • Hortonworks YARN committers will talk about upcoming features such as RM Restart, Container Re-use for MR, Multi-resource scheduling etc.
  • Chris Riccomini from LinkedIn will talk about his experiences building new applications on top of YARN.

A WebEx session will be available, so attendees from all over the world can participate. Follow the meetup page for more information and updates to the agenda.

If you would like to add to the agenda, please get in touch with Arun, or leave a comment in the meetup page.

More information is available on meetup.com here: http://www.meetup.com/Hadoop-Contributors/events/85353562/.

Apache Hadoop YARN – NodeManager

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – NodeManager


The NodeManager (NM) is YARN’s per-node agent, and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to date with the ResourceManager (RM), overseeing containers’ life-cycle management; monitoring resource usage (memory, CPU) of individual containers, tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.

NodeManager Components

    1. NodeStatusUpdater

On startup, this component registers with the RM and sends information about the resources available on the nodes. Subsequent NM-RM communication is to provide updates on container statuses – new containers running on the node, completed containers, etc.

In addition the RM may signal the NodeStatusUpdater to potentially kill already running containers.

    1. ContainerManager

This is the core of the NodeManager. It is composed of the following sub-components, each of which performs a subset of the functionality that is needed to manage containers running on the node.

      1. RPC server: ContainerManager accepts requests from Application Masters (AMs) to start new containers, or to stop running ones. It works with ContainerTokenSecretManager (described below) to authorize all requests. All the operations performed on containers running on this node are written to an audit-log which can be post-processed by security tools.
      2. ResourceLocalizationService: Responsible for securely downloading and organizing various file resources needed by containers. It tries its best to distribute the files across all the available disks. It also enforces access control restrictions of the downloaded files and puts appropriate usage limits on them.
      3. ContainersLauncher: Maintains a pool of threads to prepare and launch containers as quickly as possible. Also cleans up the containers’ processes when such a request is sent by the RM or the ApplicationMasters (AMs).
      4. AuxServices: The NM provides a framework for extending its functionality by configuring auxiliary services. This allows per-node custom services that specific frameworks may require, and still sandbox them from the rest of the NM. These services have to be configured before NM starts. Auxiliary services are notified when an application’s first container starts on the node, and when the application is considered to be complete.
      5. ContainersMonitor: After a container is launched, this component starts observing its resource utilization while the container is running. To enforce isolation and fair sharing of resources like memory, each container is allocated some amount of such a resource by the RM. The ContainersMonitor monitors each container’s usage continuously and if a container exceeds its allocation, it signals the container to be killed. This is done to prevent any runaway container from adversely affecting other well-behaved containers running on the same node.
      6. LogHandler: A pluggable component with the option of either keeping the containers’ logs on the local disks or zipping them together and uploading them onto a file-system.
    1. ContainerExecutor

Interacts with the underlying operating system to securely place files and directories needed by containers and subsequently to launch and clean up processes corresponding to containers in a secure manner.

    1. NodeHealthCheckerService

Provides functionality of checking the health of the node by running a configured script frequently. It also monitors the health of the disks specifically by creating temporary files on the disks every so often. Any changes in the health of the system are notified to NodeStatusUpdater (described above) which in turn passes on the information to the RM.

    1. Security
      1. ApplicationACLsManagerNM needs to gate the user facing APIs like container-logs’ display on the web-UI to be accessible only to authorized users. This component maintains the ACLs lists per application and enforces them whenever such a request is received.
      2. ContainerTokenSecretManager: verifies various incoming requests to ensure that all the incoming operations are indeed properly authorized by the RM.
    2. WebServer

Exposes the list of applications, containers running on the node at a given point of time, node-health related information and the logs produced by the containers.

Spotlight on Key Functionality

    1. Container Launch

To facilitate container launch, the NM expects to receive detailed information about a container’s runtime as part of the container-specifications. This includes the container’s command line, environment variables, a list of (file) resources required by the container and any security tokens.

On receiving a container-launch request – the NM first verifies this request, if security is enabled, to authorize the user, correct resources assignment, etc. The NM then performs the following set of steps to launch the container.

      1. A local copy of all the specified resources is created (Distributed Cache).
      2. Isolated work directories are created for the container, and the local resources are made available in these directories.
      3. The launch environment and command line is used to start the actual container.
    1. Log Aggregation

Handling user-logs has been one of the big pain-points for Hadoop installations in the past. Instead of truncating user-logs, and leaving them on individual nodes like the TaskTracker, the NM addresses the logs’ management issue by providing the option to move these logs securely onto a file-system (FS), for e.g. HDFS, after the application completes.

Logs for all the containers belonging to a single Application and that ran on this NM are aggregated and written out to a single (possibly compressed) log file at a configured location in the FS. Users have access to these logs via YARN command line tools, the web-UI or directly from the FS.

    1. How MapReduce shuffle takes advantage of NM’s Auxiliary-services

The Shuffle functionality required to run a MapReduce (MR) application is implemented as an Auxiliary Service. This service starts up a Netty Web Server, and knows how to handle MR specific shuffle requests from Reduce tasks. The MR AM specifies the service id for the shuffle service, along with security tokens that may be required. The NM provides the AM with the port on which the shuffle service is running which is passed onto the Reduce tasks.

Conclusion

In YARN, the NodeManager is primarily limited to managing abstract containers i.e. only processes corresponding to a container and not concerning itself with per-application state management like MapReduce tasks. It also does away with the notion of named slots like map and reduce slots. Because of this clear separation of responsibilities coupled with the modular architecture described above, NM can scale much more easily and its code is much more maintainable.

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

 

Four New Installments in ‘The Future of Apache Hadoop’ Webinar Series

During the ‘Future of Apache Hadoop’ webinar series, Hortonworks founders and core committers will discuss the future of Hadoop and related projects including Apache Pig, Apache Ambari, Apache Zookeeper and Apache Hadoop YARN.

Apache Hadoop has rapidly evolved to become the leading platform for managing, processing and analyzing big data. Consequently there is a thirst for knowledge on the future direction for Hadoop related projects. The Hortonworks webinar series will feature core committers of the Apache projects discussing the essential components required in a Hadoop Platform, current advances in Apache Hadoop, relevant use-cases and best practices on how to get started with the open source platform. Each webinar will include a live Q&A with the individuals at the center of the Apache Hadoop movement.

This four-part webinar series is now open for registration, and the schedule will include:

  • Wednesday, September 12 at 10:00 a.m. PT / 1:00 p.m. ET
  • Pig Out on Hadoop
    With: Alan Gates, Hortonworks founder and contributor to Apache Pig and HCatalog projects.
    Register here.

  • Wednesday, September 26 at 10:00 a.m. PT / 1:00 p.m. ET
  • Deployment and Management of Hadoop Clusters with Ambari
    With: Matt Foley, committer and PMC member of the Apache Hadoop Project and member of Technical Staff at Hortonworks.
    Register here.

  • Wednesday, October 17 at 10:00 a.m. PT / 1:00 p.m. ET
  • Scaling Apache Zookeeper for the Next Generation of Hadoop Applications
    With: Mahadev Konar, Hortonworks founder and contributor to the Apache Pig and HCatalog projects
    Register here.

  • Wednesday, October 31 at 10:00 a.m. PT / 1:00 p.m. ET
  • YARN: The Future of Data Processing with Apache Hadoop
    With: Arun C. Murthy, Hortonworks founder and VP of Apache Hadoop at Apache Software Foundation, the lead of the MapReduce project and YARN.
    Register here.

For more information, please register.

Previous webinars on “The Future of Apache Hadoop” are available here.

A press release is available here.

Click to Tweet: @Hortonworks unveils four new live webinars, with Q&A, on “The Future of Apache Hadoop” series http://bit.ly/OM0XpE #BigData #Hadoop

Apache Hadoop YARN – ResourceManager

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – ResourceManager

As previously described, ResourceManager (RM) is the master that arbitrates all the available cluster resources and thus helps manage the distributed applications running on the YARN system. It works together with the per-node NodeManagers (NMs) and the per-application ApplicationMasters (AMs).

  1. NodeManagers take instructions from the ResourceManager and manage resources available on a single node.
  2. ApplicationMasters are responsible for negotiating resources with the ResourceManager and for working with the NodeManagers to start the containers.

Diagram of resource manager components

ResourceManager Components

The ResourceManager has the following components (see the figure above):

  1. Components interfacing RM to the clients:
    • ClientService: The client interface to the Resource Manager. This component handles all the RPC interfaces to the RM from the clients including operations like application submission, application termination, obtaining queue information, cluster statistics etc.
    • AdminService: To make sure that admin requests don’t get starved due to the normal users’ requests and to give the operators’ commands the higher priority, all the admin operations like refreshing node-list, the queues’ configuration etc. are served via this separate interface.
  2. Components connecting RM to the nodes:
    • ResourceTrackerService: This is the component that responds to RPCs from all the nodes. It is responsible for registration of new nodes, rejecting requests from any invalid/decommissioned nodes, obtain node-heartbeats and forward them over to the YarnScheduler. It works closely with NMLivelinessMonitor and NodesListManager described below.
    • NMLivelinessMonitor: To keep track of live nodes and specifically note down the dead nodes, this component keeps track of each node’s its last heartbeat time. Any node that doesn’t heartbeat within a configured interval of time, by default 10 minutes, is deemed dead and is expired by the RM. All the containers currently running on an expired node are marked as dead and no new containers are scheduling on such node.
    • NodesListManager: A collection of valid and excluded nodes. Responsible for reading the host configuration files specified via yarn.resourcemanager.nodes.include-path and yarn.resourcemanager.nodes.exclude-path and seeding the initial list of nodes based on those files. Also keeps track of nodes that are decommissioned as time progresses.
  3. Components interacting with the per-application AMs:
    • ApplicationMasterService: This is the component that responds to RPCs from all the AMs. It is responsible for registration of new AMs, termination/unregister-requests from any finishing AMs, obtaining container-allocation & deallocation requests from all running AMs and forward them over to the YarnScheduler. This works closely with AMLivelinessMonitor described below.
    • AMLivelinessMonitor: To help manage the list of live AMs and dead/non-responding AMs, this component keeps track of each AM and its last heartbeat time. Any AM that doesn’t heartbeat within a configured interval of time, by default 10 minutes, is deemed dead and is expired by the RM. All the containers currently running/allocated to an AM that gets expired are marked as dead. RM schedules the same AM to run on a new container, allowing up to a maximum of 4 such attempts by default.
  4. The core of the ResourceManager – the scheduler and related components:
    • ApplicationsManager: Responsible for maintaining a collection of submitted applications. Also keeps a cache of completed applications so as to serve users’ requests via web UI or command line long after the applications in question finished.
    • ApplicationACLsManager: RM needs to gate the user facing APIs like the client and admin requests to be accessible only to authorized users. This component maintains the ACLs lists per application and enforces them whenever an request like killing an application, viewing an application status is received.
    • ApplicationMasterLauncher: Maintains a thread-pool to launch AMs of newly submitted applications as well as applications whose previous AM attempts exited due to some reason. Also responsible for cleaning up the AM when an application has finished normally or forcefully terminated.
    • YarnScheduler: The Scheduler is responsible for allocating resources to the various running applications subject to constraints of capacities, queues etc. It performs its scheduling function based on the resource requirements of the applications such as memory, CPU, disk, network etc. Currently, only memory is supported and support for CPU is close to completion.
    • ContainerAllocationExpirer: This component is in charge of ensuring that all allocated containers are used by AMs and subsequently launched on the correspond NMs. AMs run as untrusted user code and can potentially hold on to allocations without using them, and as such can cause cluster under-utilization. To address this, ContainerAllocationExpirer maintains the list of allocated containers that are still not used on the corresponding NMs. For any container, if the corresponding NM doesn’t report to the RM that the container has started running within a configured interval of time, by default 10 minutes, the container is deemed as dead and is expired by the RM.
  5. TokenSecretManagers (for security):ResourceManager has a collection of SecretManagers which are charged with managing tokens, secret-keys that are used to authenticate/authorize requests on various RPC interfaces. A future post on YARN security will cover a more detailed descriptions of the tokens, secret-keys and the secret-managers but a brief summary follows:
    • ApplicationTokenSecretManager: To avoid arbitrary processes from sending RM scheduling requests, RM uses the per-application tokens called ApplicationTokens. This component saves each token locally in memory till application finishes and uses it to authenticate any request coming from a valid AM process.
    • ContainerTokenSecretManager: SecretManager for ContainerTokens that are special tokens issued by RM to an AM for a container on a specific node. ContainerTokens are used by AMs to create a connection to the corresponding NM where the container is allocated. This component is RM-specific, keeps track of the underlying master and secret-keys and rolls the keys every so often.
    • RMDelegationTokenSecretManager: A ResourceManager specific delegation-token secret-manager. It is responsible for generating delegation tokens to clients which can be passed on to unauthenticated processes that wish to be able to talk to RM.
  6. DelegationTokenRenewer: In secure mode, RM is Kerberos authenticated and so provides the service of renewing file-system tokens on behalf of the applications. This component renews tokens of submitted applications as long as the application runs and till the tokens can no longer be renewed.

Conclusion

In YARN, the ResourceManager is primarily limited to scheduling i.e. only arbitrating available resources in the system among the competing applications and not concerning itself with per-application state management. Because of this clear separation of responsibilities coupled with the modularity described above, and with the powerful scheduler API discussed in the previous post, RM is able to address the most important design requirements – scalability, support for alternate programming paradigms.

To allow for different policy constraints, the scheduler described above in the RM is pluggable and allows for different algorithms. In a future post of this series, we will dig deeper into various features of CapacityScheduler that schedules containers based on capacity guarantees and queues.

The next post will dive into details of the NodeManager, the component responsible for managing the containers’ life cycle and much more.

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – Background and an Overview

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – Background & Overview

Celebrating the significant milestone that was Apache Hadoop YARN being promoted to a full-fledged sub-project of Apache Hadoop in the ASF we present the first blog in a multi-part series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.

MapReduce – The Paradigm

Essentially, the MapReduce model consists of a first, embarrassingly parallel, map phase where input data is split into discreet chunks to be processed. It is followed by the second and final reduce phase where the output of the map phase is aggregated to produce the desired result. The simple, and fairly restricted, nature of the programming model lends itself to very efficient and extremely large-scale implementations across thousands of cheap, commodity nodes.

Apache Hadoop MapReduce is the most popular open-source implementation of the MapReduce model.

In particular, when MapReduce is paired with a distributed file-system such as Apache Hadoop HDFS, which can provide very high aggregate I/O bandwidth across a large cluster, the economics of the system are extremely compelling – a key factor in the popularity of Hadoop.

One of the keys to this is the lack of data motion i.e. move compute to data and do not move data to the compute node via the network. Specifically, the MapReduce tasks can be scheduled on the same physical nodes on which data is resident in HDFS, which exposes the underlying storage layout across the cluster. This significantly reduces the network I/O patterns and keeps most of the I/O on the local disk or within the same rack – a core advantage.

Apache Hadoop MapReduce, circa 2011 – A Recap

Apache Hadoop MapReduce is an open-source, Apache Software Foundation project, which is an implementation of the MapReduce programming paradigm described above. Now, as someone who has spent over six years working full-time on Apache Hadoop, I normally like to point out that the Apache Hadoop MapReduce project itself can be broken down into the following major facets:

  • The end-user MapReduce API for programming the desired MapReduce application.
  • The MapReduce framework, which is the runtime implementation of various phases such as the map phase, the sort/shuffle/merge aggregation and the reduce phase.
  • The MapReduce system, which is the backend infrastructure required to run the user’s MapReduce application, manage cluster resources, schedule thousands of concurrent jobs etc.

This separation of concerns has significant benefits, particularly for the end-users – they can completely focus on the application via the API and allow the combination of the MapReduce Framework and the MapReduce System to deal with the ugly details such as resource management, fault-tolerance, scheduling etc.

The current Apache Hadoop MapReduce System is composed of the JobTracker, which is the master, and the per-node slaves called TaskTrackers.

The JobTracker is responsible for resource management (managing the worker nodes i.e. TaskTrackers), tracking resource consumption/availability and also job life-cycle management (scheduling individual tasks of the job, tracking progress, providing fault-tolerance for tasks etc).

The TaskTracker has simple responsibilities – launch/teardown tasks on orders from the JobTracker and provide task-status information to the JobTracker periodically.

For a while, we have understood that the Apache Hadoop MapReduce framework needed an overhaul. In particular, with regards to the JobTracker, we needed to address several aspects regarding scalability, cluster utilization, ability for customers to control upgrades to the stack i.e. customer agility and equally importantly, supporting workloads other than MapReduce itself.

We’ve done running repairs over time, including recent support for JobTracker availability and resiliency to HDFS issues (both of which are available in Hortonworks Data Platform v1 i.e. HDP1) but lately they’ve come at an ever-increasing maintenance cost and yet, did not address core issues such as support for non-MapReduce and customer agility.

Why support non-MapReduce workloads?

MapReduce is great for many applications, but not everything; other programming models better serve requirements such as graph processing (Google Pregel / Apache Giraph) and iterative modeling (MPI). When all the data in the enterprise is already available in Hadoop HDFS having multiple paths for processing is critical.

Furthermore, since MapReduce is essentially batch-oriented, support for real-time and near real-time processing such as stream processing and CEPFresil are emerging requirements from our customer base.

Providing these within Hadoop enables organizations to see an increased return on the Hadoop investments by lowering operational costs for administrators, reducing the need to move data between Hadoop HDFS and other storage systems etc.

Why improve scalability?

Moore’s Law… Essentially, at the same price-point, the processing power available in data-centers continues to increase rapidly. As an example, consider the following definitions of commodity servers:

  • 2009 – 8 cores, 16GB of RAM, 4x1TB disk
  • 2012 – 16+ cores, 48-96GB of RAM, 12x2TB or 12x3TB of disk.

Generally, at the same price-point, servers are twice as capable today as they were 2-3 years ago – on every single dimension.  Apache Hadoop MapReduce is known to scale to production deployments of ~5000 nodes of hardware of 2009 vintage. Thus, ongoing scalability needs are ever present given the above hardware trends.

What are the common scenarios for low cluster utilization?

In the current system, JobTracker views the cluster as composed of nodes (managed by individual TaskTrackers) with distinct map slots and reduce slots, which are not fungible.  Utilization issues occur because maps slots might be ‘full’ while reduce slots are empty (and vice-versa).  Fixing this was necessary to ensure the entire system could be used to its maximum capacity for high utilization.

What is the notion of customer agility?

In real-world deployments, Hadoop is very commonly deployed as a shared, multi-tenant system. As a result, changes to the Hadoop software stack affect a large cross-section if not the entire enterprise. Against that backdrop, customers are very keen on controlling upgrades to the software stack as it has a direct impact on their applications. Thus, allowing multiple, if limited, versions of the MapReduce framework is critical for Hadoop.

Enter Apache Hadoop YARN

The fundamental idea of YARN is to split up the two major responsibilities of the JobTracker i.e. resource management and job scheduling/monitoring, into separate daemons: a global ResourceManager and per-application ApplicationMaster (AM).

The ResourceManager and per-node slave, the NodeManager (NM), form the new, and generic, system for managing applications in a distributed manner.

The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific entity and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the component tasks.

The ResourceManager has a pluggable Scheduler, which is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is a pure scheduler in the sense that it performs no monitoring or tracking of status for the application, offering no guarantees on restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a Resource Container which incorporates resource elements such as memory, cpu, disk, network etc.

The NodeManager is the per-machine slave, which is responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager.

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress. From the system perspective, the ApplicationMaster itself runs as a normal container.

Here is an architectural view of YARN:

One of the crucial implementation details for MapReduce within the new YARN system that I’d like to point out is that we have reused the existing MapReduce framework without any major surgery. This was very important to ensure compatibility for existing MapReduce applications and users. More on this later.

The next post will dive further into the intricacies of the architecture and its benefits such as significantly better scaling, support for multiple data processing frameworks (MapReduce, MPI etc.) and cluster utilization.

 

Other posts in this series:
Introducing Apache Hadoop YARN
Apache Hadoop YARN – Background and an Overview
Apache Hadoop YARN – Concepts and Applications
Apache Hadoop YARN – ResourceManager
Apache Hadoop YARN – NodeManager