Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a look under the hood and demonstrate a Deep Learning framework (TensorFlow) assembly on Apache Hadoop YARN.
TensorFlow™ is an open source software library for numerical computation using data flow graphs. It is one of the most popular platforms for creating machine learning and deep learning applications.
We’ve talked about YARN and docker container runtime in Hadoop Summit 2016.
The recently published Nvidia Docker wrapper makes it possible to use GPU capabilities in a Docker container.
YARN has been used successfully to run all sorts of data applications. These applications can all coexist on a shared infrastructure managed through YARN’s centralized scheduling.
With TensorFlow, one can get started with deep learning without much knowledge about advanced math models and optimization algorithms.
If you have GPU-equipped hardware, and you want to run TensorFlow, going through the process of setting up hardware, installing the bits, and optionally also dealing with faults, scaling the app up and down etc. becomes cumbersome really fast. Instead, integrating TensorFlow to YARN allows us to seamlessly manage resources across machine learning / deep learning workloads and other YARN workloads like MapReduce, Spark, Hive, etc.
If you have a few GPUs and you want them to be shared by multiple tenants and applications to run TensorFlow and other YARN applications all on the same cluster, this blogpost will help you.
This blog uses a couple of features of YARN, under ongoing active development:
Both of the features will most likely be available in one of the Apache Hadoop 3.x releases.
A common workflow of TensorFlow (And this is common for any supervised machine learning platform) is like this:
Also, TensorFlow added support for distributed training since 0.8 release, which can significantly shorten training time in lots of cases. With YARN, we can start a distributed training/serving cluster combo in no time!
Below is an example of assembly description file for TensorFlow assembly.
In our example, we have 3 components:
So we have 3 components in the assembly description file.
Let’s look at the description of each of the components:
First, let’s look at the definition of trainer:
Description of parameter server is very much similar to trainer, the only difference is specify job_name to “ps” instead of “worker”
Following is a description of serving server, it changed to use different docker image and launch command to make it serves inception v3 model to do image classification. I will cover more details in the demo video below.
The following demo video shows the launch of TensorFlow assembly on YARN, running TensorFlow training and serving the inception-v3 model to do image classification.
Nvidia-docker is a wrapper of /usr/bin/docker, which is required to make processes inside docker container to use GPU capacity. So first of all, you need to Follow the Nvidia-Docker installation guide to properly install dependencies such as Docker / Nvidia driver, etc. on all the nodes.
After Nvidia Docker is installed, test it by executing the following command on each of the nodes:
You should be able to see output like:
Then, you need to set YARN configuration to use nvidia-docker binary to launch container, just update container-executor.cfg under $HADOOP_CONF_DIR, set docker.binary to
Once all above steps are done, you can launch docker containers through YARN as same as part 1. And all these containers can access GPU.
The training process can speed a lot when it runs with GPU support!
Below is screen recording of running same training process without and with GPU.