Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 08, 2018
prev slideNext slide

Distributed Pricing Engine using Dockerized Spark on YARN w/ HDP 3.0 [Part 3/4]

This is the 3nd blog in the series (see part 1, part 2) where we will walk through the tech stack and prepare the environment.

The entire infrastructure is provisioned on OpenStack private cloud using Cloudbreak 2.7.0 which first automates the installation of Docker CE 18.03 on CentOS 7.4 VMs and then the installation of HDP 3.0 cluster with Apache Hadoop 3.1 and Apache Spark 2.3 using the new shiny Apache Ambari 2.7.0

To Try This at Home

Ensure you have access to a Hortonworks Cloudbreak 2.7.0 instance. Please refer to the documentation to meet the prerequisites and setup credentials for the desired cloud provider.

Clone this repo

Update the following as desired:

  • Cloudbreak cluster definition
  • Ambari cluster blueprint

Now upload the following to your Cloudbreak instance:

  • Ambari Blueprint

  • Recipe to install Docker CE with custom Docker daemon settings:

  • Now execute the following using Cloudbreak CLI to provision the cluster:

    cb cluster create cliinputjson <cluster-def.json>name <cluster-name>

This will first instantiate a cluster using the cluster definition JSON and the referenced base image, download packages for Ambari and HDP, install Docker (a pre-requisite to running Dockerized apps on YARN) and setup the DB for Ambari and Hive using the recipes and then install HDP 3 using the Ambari blueprint.

Once the cluster is built, you should be able to log into Ambari to verify

Now, we will configure YARN Node Manager to run LinuxContainerExecutor in non-secure mode, just for demonstration purpose, so that all Docker containers scheduled by YARN will run as ‘nobody’ user. Kerberized cluster with cgroups enabled is recommended for production.

Enable Docker Runtime for YARN

Update yarn-site.xml and container-executor.cfg as follows:


A few configurations to note here:

  • Setting nodemanager.container-exexutor.class to use LinuxContainerExecutor
  • Setting to a value (50) less than user id of user ‘nobody’ (99)
  • Mounting /etc/passwd on read-only mode into the Docker containers to expose the spark user
  • Adding requisite Docker registries to the trusted list

Now restart YARN

We are now ready to simulate pricing of instruments which we will take a look at in the final blog of the series, you can find it here!


Geek Squad says:

I appreciate your blog because it contains good stuff, true knowledge, thanks for sharing such information, you may contact us for any home appliance including Electronics Gadgets through

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums