The Hortonworks Blog

Posts categorized by : YARN

The last couple of weeks have been a period of intense activity around the Apache projects that comprise the Hadoop ecosystem. While most of the headlines were accorded to Apache Hadoop 2 going GA, it would be remiss not to pay attention to the great progress being made in the Apache projects that complement Hadoop.

We have blogged about these over the course of the past week and the list below provides a quick summary of the phenomenal work contributed in the open by the folks driving these diverse and vital communities.…

This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the 8th post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series: 

Introduction

In YARN, applications perform their work by running containers, which today map to processes on the underlying operating system.…

An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.

In this blog article, we’ll explore how HDP 2.0 stores and provides insight into the performance of a MapReduce job on YARN.

Change from MapReduce v1 and HDP 1.x

In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week – as Hadoop 2 goes GAArun Murthy discusses his journey with Hadoop. The journey has taken Arun from developing Hadoop, to founding Hortonworks, to this week’s release of Hadoop 2, with its Yarn-based architecture.

Arun describes the difference between MapReduce and YARN, and how they are related in Hadoop 2 (and by extension in Hortonworks Data Platform v2).…

I’m thrilled to note that the Apache Hadoop community has declared Apache Hadoop 2.x as Generally Available with the release of hadoop-2.2.0!

This represents the realization of a massive effort by the entire Apache Hadoop community which started nearly 4 years to date, and we’re sure you’ll agree it’s cause for a big celebration. Equally, it’s a great credit to the Apache Software Foundation which provides an environment where contributors from various places and organizations can collaborate to achieve a goal which is as significant as Apache Hadoop v2.…

I’ve been sitting on this post for a while as Apache Hadoop 2 GA work was keeping me extremely busy. As they say, better late than never, so here we go – the slides are at the end of the post.

Three weeks ago, we had a Apache Hadoop YARN meetup at LinkedIn. Kind folks at LinkedIn had offered to host us in addition to talking about exciting projects like usage of YARN at LinkedIn, and applications on YARN like Apache Samza, Apache Giraph and Apache Helix.…

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform.…

This post from Vinod Kumar Vavilapalli of Hortonworks and  Chris Douglas and Carlo Curino of Microsoft Research.

Great news from the Apache Hadoop YARN community! A paper describing Apache Hadoop YARN was accepted at 2013 ACM Symposium on Cloud Computing (SoCC 2013), where it won the award for best paper! Here’s the title and abstract:

Title

Apache Hadoop YARN: Yet Another Resource Negotiator [Industrial Paper]

Abstract

The initial design of Apache Hadoop was tightly focused on running massive, MapReduce jobs to process a web crawl.…

This post is the fourth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

The previous couple of blogs covered Tez concepts and APIs.…

Personally, I’ve followed the Go Programming Language (golang) with increasing interest for a while and have been itching to really sink my teeth into it. I’ve always felt you never learn any programming language for real unless it’s used to build a fairly large, real-world solution. It’s the only way to gain tackle real issues and gain some confidence for future battles with destiny… FTR, my first real project in Java was Hadoop, circa 2006.…

YARN and the Hortonworks Data Platform 2.0 enables one Hadoop cluster to share data and analytical processing capabilities across the Enterprise organization. Organizations can use the Hortonworks Data Platform 2.0 to:

  • Pool all enterprise data into one scalable and reliable storage platform
  • Enable all analytical processing IN the data platform
  • Provide access to this data and processing across all business units

The Capacity Scheduler (CS) ensures that groups of users and applications will get a guaranteed share of the cluster, while maximizing overall utilization of the cluster.…

As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment.…

We hosted a webinar on YARN a couple of weeks ago (see the slides and playback here). As you might expect, there was a lot of great questions and here is a set of answers to those questions.

Our next YARN-oriented Office Hours online on Sept 11th at 2pm PST. Join us on Meetup!

Who is using YARN and what benefits have they received from it?

On great public example of in production use of YARN, is at Yahoo!.…

Another week, another release…  Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.

For our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap. …

If you’re heading back to work today after a long hot summer then here’s some notes on last week here at Hortonworks.

Building a modern data architecture. We kicked off the week with some discussion on what it means to implement Hadoop alongside existing data architecture components. Jim covered 3 essential requirements: integration with existing systems, reuse of existing skills, enterprise requirements such as reliability and availability. We also held the first webinar in our series on implementing Hadoop in the enterprise: this one was with Teradata.…

Go to page:12345

Thank you for subscribing!