How to build a Hadoop VM with Ambari and Vagrant

Hadoop Inception.

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.


Spin up a VM with Vagrant

Create a folder for this VM: mkdir hdp_vm

<Display Name>

If you have Virtual Box and Vagrant installed on your system, change directory to it and issue the following command:

vagrant box add hdp_vm

<Display Name>

Once it has completed the download and added to your library of VMs with the name hdp_vm, issue the command:

vagrant init hdp_vm

This will create a file ‘Vagrantfile’ in the folder. Open it in a text editor like ‘vi’:

<Display Name>

Edit the ‘Vagrantfile’, so that port 8080 on the VM is forwarded to port 8080 on the host:

<Display Name>

Let’s also modify the settings so that the VM is assigned adequate Memory once it is launched:

<Display Name>

We are ready to launch the VM. Once the VM is launched, SSH in and login as root and change to the home directory of the ‘root’:

<Display Name>

Configure the VM

Find out the default hostname of the VM and note it down:

<Display Name>

Then we need to edit the ‘/etc/hosts’ file so that we have an entry of this hostname. Open ‘/etc/hosts’ in ‘vi’ and it might look like this:

<Display Name>

It needs to looks like this:

<Display Name>

Now we will install the NTP service with the following commands:

yum install ntp

Next we will install the wget utility with the following commands:

yum install wget

Once these are installed turn on the ntp service with the commands:

chkconfig ntpd on
service ntpd start

<Display Name>

Setting up passwordless SSH

Get a pair of keys: ssh-keygen

<Display Name>

The keys will be placed in the folder .ssh.

  • Copy the id_rsa file to /vagrant folder so that you can access the private key from the host machine as /vagrant is automatically the shared folder between host and guest OSs.
  • Also append, the public key to the authorized_keys keys file.

<Display Name>

Setup Ambari

Download and copy the Ambari repository bits to /etc/yum.repos.d:

cp ambari.repo /etc/yum.repos.d

Double check that the repo has been configured correctly:
yum repolist

<Display Name>

Now we are ready to install the bits from the repo:
yum install ambari-server

<Display Name>

Now we can configure the bits. I just go with the defaults during the configuration:
ambari-server setup

<Display Name>

Let’s spin up Ambari:
ambari-server start

<Display Name>

Setting up the pseudo-cluster with Ambari:

Now you can access Ambari from your host machine at the url http://localhost:8080. The username and password is admin and admin respectively:

<Display Name>

Name your cluster:

<Display Name>

Select HDP 2.0:

<Display Name>

Input the hostname of your VM and click on the Choose File button:

<Display Name>

Select the private key file you can find in the folder you created at the beginning of this post:

<Display Name>

Select the default options for the rest of the steps till you get to Customize Services. In this step, configure your preferred credentials especially for the components marked with a white number against the red background:

<Display Name>

Finish up the wizard.

<Display Name>

Voila!!! We have our very own Hadoop VM.

Happy Hadooping!


Categorized by :
Ambari HDP Operations & Management Sandbox


Brian de la Motte
August 28, 2014 at 5:02 pm

Excellent article! I am glad this material was available! Very easy to follow along and I love using Vagrant! The only thing I think this guide might be missing is installing and starting httpd.
I had to run the following:
yum install -y httpd
chkconfig httpd on
service httpd start

Other than that, everything went pretty smoothly!

September 3, 2014 at 1:33 am

It Looks good. I will try it at night.

I have a mac air with 4g of ram. Can I ran this vagrant box well? How much ram do I need to ram this vagrantbox?

September 3, 2014 at 11:31 pm

After the command ambari-server start
you must turn off iptables to access the web server.

chkconfig iptables off
/etc/init.d/iptables stop

October 20, 2015 at 7:27 am

looks like a very nice tutorial, only problem is my company blocks github, is there a way around this issue?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.