Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
March 08, 2017
prev slideNext slide

Data Lake 3.0 Part 3 – Distributed TensorFlow Assembly on Apache Hadoop YARN

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a look under the hood and demonstrate a Deep Learning framework (TensorFlow) assembly on Apache Hadoop YARN.

TensorFlow™ is an open source software library for numerical computation using data flow graphs. It is one of the most popular platforms for creating machine learning and deep learning applications.

It’s best to run TensorFlow on machines equipped with GPUs, since TensorFlow can leverage CUDA and cuDNN to speed-up a lot by leveraging GPU power!

We’ve talked about YARN and docker container runtime in Hadoop Summit 2016.

The recently published Nvidia Docker wrapper makes it possible to use GPU capabilities in a Docker container.

Part 0. Why TensorFlow assembly on YARN?

YARN has been used successfully to run all sorts of data applications. These applications can all coexist on a shared infrastructure managed through YARN’s centralized scheduling.

With TensorFlow, one can get started with deep learning without much knowledge about advanced math models and optimization algorithms.

If you have GPU-equipped hardware, and you want to run TensorFlow, going through the process of setting up hardware, installing the bits, and optionally also dealing with faults, scaling the app up and down etc. becomes cumbersome really fast. Instead, integrating TensorFlow to YARN allows us to seamlessly manage resources across machine learning / deep learning workloads and other YARN workloads like MapReduce, Spark, Hive, etc.

If you have a few GPUs and you want them to be shared by multiple tenants and applications to run TensorFlow and other YARN applications all on the same cluster, this blogpost will help you.

Part 1. Background

This blog uses a couple of features of YARN, under ongoing active development:

  1. YARN-3611, Support Docker Containers In LinuxContainerExecutor: Better support of Docker container execution in YARN
  2. YARN-4793, Simplified API layer for services and beyond: The new simple-services API layer backed by REST interfaces. This API can be used to create and manage the lifecycle of YARN services in a simple manner. Services here can range from simple single-component apps to complex multi-component assemblies needing orchestration.

Both of the features will most likely be available in one of the Apache Hadoop 3.x releases.

Part 2. Running TensorFlow Services Assembly on YARN

A common workflow of TensorFlow (And this is common for any supervised machine learning platform) is like this:

  1. Training cluster reads from input dataset, uses algorithms to build a data model. For example, see Google’s cat recognition learning.
  2. Serving cluster serves the data model, so that any client can connect to the serving cluster to predict new samples, for example, ask “Is this a cat?”
  3. Training cluster can periodically update data model when new data available, so clients can get prediction results from the up-to-date model.

Also, TensorFlow added support for distributed training since 0.8 release, which can significantly shorten training time in lots of cases. With YARN, we can start a distributed training/serving cluster combo in no time!

Assembly description file

To orchestrate multi-component assembly on YARN, we need to prepare a description file so YARN can know how to launch it (please refer to the video and slides for an introduction on assemblies).

Below is an example of assembly description file for TensorFlow assembly.

In our example, we have 3 components:

  1. Training cluster components
    1. Trainers: Training the model from training.
    2. Parameter servers: Manage shared model parameters for trainers.
  2. Serving cluster components: Uses tensorflow-serving to serve exported data model from training cluster.

So we have 3 components in the assembly description file.

Let’s look at the description of each of the components:

Trainer

First, let’s look at the definition of trainer:

Some notes:

  1. Docker image is specified to be “hortonworks/tensorflow-yarn:0.2”
  2. The launch command to use is example-distributed-trainer.py
  3. It specified hostname/ports of parameter servers and workers (trainers). We configured DNS to make launched container has a unique hostname to: <component_name>.<assembly-name>.<user-name>.<domain>

Parameter Server

Description of parameter server is very much similar to trainer, the only difference is specify job_name to “ps” instead of “worker”

Serving server

Following is a description of serving server, it changed to use different docker image and launch command to make it serves inception v3 model to do image classification. I will cover more details in the demo video below.

Demo

The following demo video shows the launch of TensorFlow assembly on YARN, running TensorFlow training and serving the inception-v3 model to do image classification.

Part 3. Accessing GPU in YARN assembly

Configurations

Nvidia-docker is a wrapper of /usr/bin/docker, which is required to make processes inside docker container to use GPU capacity. So first of all, you need to Follow the Nvidia-Docker installation guide to properly install dependencies such as Docker / Nvidia driver, etc. on all the nodes.

After Nvidia Docker is installed, test it by executing the following command on each of the nodes:

You should be able to see output like:

Then, you need to set YARN configuration to use nvidia-docker binary to launch container, just update container-executor.cfg under $HADOOP_CONF_DIR, set docker.binary to

Once all above steps are done, you can launch docker containers through YARN as same as part 1. And all these containers can access GPU.

The training process can speed a lot when it runs with GPU support!

Below is screen recording of running same training process without and with GPU.

Without GPU

With GPU

Read the next blog in the series: Data Lake 3.0 Part 4 – Cutting Storage Overhead in Half with HDFS Erasure Coding

Comments

  • Great article. Especially when Yahoo Research just launched TensorflowOnSpark and Intel launching BigDL last week.

    a) How do you get access to Tensorboard when you launch this in the cluster. Would love to see the details on the posting.

    b) Tensorflow has a device attribute to run on CPU or GPU. https://www.tensorflow.org/tutorials/using_gpu

    c) Need more details on training time how do you specify the different hosts. This is one of the challenge Tensorflow on Spark has.

    Love to get a blog with more details on Training time and GPU Isolations.

    There is general Resource Management issues in large expensive GPU clusters. Would love to have Yarn community to work closely with Tensorflow, Caffe, torch etc for caching and performance improvement in training large networks.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>