Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
September 22, 2017
prev slideNext slide

YInception: A YARN Based Container Cloud and How We Certify Hadoop on Hadoop

This is the second post in the Engineering @ Hortonworks blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms.

In this post, we deep dive into something that we are extremely excited about – Running a container cloud on YARN! We have been using this next-generation infrastructure for more than a year in running all of Hortonworks internal CI / CD infrastructure.

With this, we can now run Hadoop on Hadoop to certify our releases! Let’s dive right in!

Certifying Hortonworks Platforms

The introductory post on Engineering @ Hortonworks gave the readers an overview of the scale of challenges we see in delivering an Enterprise Ready Data platform.

Essentially, for every new release of a platform, we provision Hadoop clusters on demand, with specific configurations like authentication on/off, encryption on/off, DB combinations, and OS environments, run a bunch of tests to validate changes, and shut them down.

And we do this over and over, day in and day out, throughout the year.

Older generation infrastructure

Certifying our platforms is thus an arduous deal. The specific challenge that is relevant to this post is how resources get shared and utilized efficiently when we bring up and down these Hadoop clusters. We discuss these issues related to resource management below.

  1. Scheduling needs: The existing infrastructure had very limited scheduling primitives to support hundreds of users running workloads at various points in the day. It was difficult to work with unpredictable resource sharing, minimal insights into historical usage, lack of limits per user / tenant etc. These are some of the things that YARN is build to do!
  2. Older infrastructure of VMs
    • The matrix of combinations discussed in the introductory post used to be tested on an older generation of infrastructure based on VMs. The older infrastructure had its share of pros and cons.The most significant con being that VMs got used in our infrastructure in several places where we really didn’t need them. We didn’t need the very coarse isolation that VMs provide in terms of both isolation as well as security sandboxing.
    • Using VMs in that manner didn’t help us in realizing the huge potential for resource utilization optimizations. We needed to do something to get more out of our hardware investments.
  3. Need for more internal long lived clusters: From the viewpoint of delivering a rigorously tested platform, the number of internal instances where clusters would run forever – and go through operations and upgrades that a typical customer goes through – was just not enough.
  4. Scaling needs: The existing VM based infrastructure was simply not scaling to our needs. Note that some of our needs are actually very unusual – we rapidly bring up and down VMs at large scale every day.

There is a dogfood opportunity here that helps both us internally as well as our customers and users: Use what we ship and ship what we use.

We recently talked about how Apache Hadoop is moving towards a multi-colored YARN. The need for such a multi-colored YARN is coming from the demand for new use-cases and technology drivers discussed in that post.

Time for YInception ! Moving to a YARN based Container Cloud

Looking at the internal pains discussed above, we asked ourselves – “Our users and customers have a very powerful platform for scheduling and managing their applications. Why not use that same platform for our internal needs?” To this end, nearly 15 months ago, we started migration of our workloads to a YARN based container cloud.

What is this new infrastructure? It uses the well-known HDP components that you are already familiar with – Ambari, Zookeeper, HDFS, YARN newly augmented with Docker support and Slider. It also is one of the few fully functioning production clusters that is running all the time inside Hortonworks.

If you look carefully, through this mechanism, one can spot a YARN cluster running inside (containers running on) another YARN cluster! That is, we have finally realized YInception – YARN on YARN!

YInception: Hadoop clusters running on Hadoop cluster
YInception: Hadoop clusters running on Hadoop cluster

This is not a radically new idea, though. This is very similar to how source code compilers go through their own evolution. Through the boostrapping process, developers use an older version of a compiler to build the newer version of the compiler! What we are doing here is to use an older version of Hadoop as a compute platform to test the newer versions of Hadoop. And Hive. And Spark. And everything!

Workloads running on YARN Container Cloud

Different type of apps run on this platform : (a) containerized apps and services – a key promise of YARN in Apache Hadoop 3.0 and (b) legacy applications that run inside containers made to look like VMs each with its own IP address, SSH based access etc.

HWX internal workloads running on YARN container cloud
HWX internal workloads running on YARN container cloud

Let’s look at one specific workload. The following is a depiction of a system-test cluster running on this container cloud. On the base YARN cluster, we dynamically bring up N containers, and then use Ambari to install HDP components inside those containers!

Running Hadoop system test clusters on Hadoop
Running Hadoop system test clusters on Hadoop

Similarly, running all unit-tests of the Apache Hadoop project can take more than 6-7 hours. We have developed a parallel Master-Worker framework that runs on the YARN Container Cloud which brings the unit-test run-time to 10-15 minutes! We will talk more about this in one of the next posts.

Present state

We have been running hard and fast with this new architecture for more than a year! The following graphs show some key metrics of this cluster over the last 15 months.

YARN Container Cloud Memory growth
YARN Container Cloud Memory growth

Growth in terms of containers’ memory resource over time.
Yellow: Used + Reserved Memory
Blue: Total cluster Memory

YARN Container Cloud VCores Growth
YARN Container Cloud VCores Growth

Growth in terms of containers’ CPU resource over time.
Yellow: Used + Reserved CPU
Blue: Total cluster CPU

As you can see, the cluster slowly got built over time. The workloads likewise slowly got migrated from the old platform to the new one.

Just looking at the number of apps and containers, we so far have more than 2.4M containers allocated!!

And close to 400K applications completed so far!

What next?

We first lifted the curtains on our next generation infrastructure at Dataworks Summit San Jose 2017:

Shane and Jian gave an excellent talk on this topic at the DataWorks Summit, San Jose: https://dataworkssummit.com/san-jose-2017/sessions/running-a-container-cloud-on-yarn/, and we will be condensing some of that information in a little blog post next!

To conclude, this container-cloud infrastructure on YARN is running inside Hortonworks at large scale for more than a year and continues to ratchet up on machine count as well as the volume / variety of workloads. Gour Saha, Billie Rinaldi, Shane Kumpf, Jian He (and Varun Vasudev, Jon Maron in the past) have all been part of this crazy team building such an amazing infrastructure.

The underlying source code for this infrastructure is derived from two key initiatives in Apache Hadoop project: (1) support in YARN for containerized applications (YARN-3853) and (2) first class support for services on YARN. Even though we have been running this code on a quote-production cluster-unquote, these are efforts that are largely works-in-progress that we hope to bring to your clusters as part of Apache Hadoop 3.0!

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>