How to build a Hadoop VM with Ambari and Vagrant

Hadoop Inception.

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.


Spin up a VM with Vagrant

Create a folder for this VM: mkdir hdp_vm

If you have Virtual Box and Vagrant installed on your system, change directory to it and issue the following command:

vagrant box add hdp_vm

Once it has completed the download and added to your library of VMs with the name hdp_vm, issue the command:

vagrant init hdp_vm

This will create a file ‘Vagrantfile’ in the folder. Open it in a text editor like ‘vi’:

Edit the ‘Vagrantfile’, so that port 8080 on the VM is forwarded to port 8080 on the host:

Let’s also modify the settings so that the VM is assigned adequate Memory once it is launched:

We are ready to launch the VM. Once the VM is launched, SSH in and login as root and change to the home directory of the ‘root’:

Configure the VM

Find out the default hostname of the VM and note it down:

Then we need to edit the ‘/etc/hosts’ file so that we have an entry of this hostname. Open ‘/etc/hosts’ in ‘vi’ and it might look like this:

It needs to looks like this:

Now we will install the NTP service with the following commands:

yum install ntp

Next we will install the wget utility with the following commands:

yum install wget

Once these are installed turn on the ntp service with the commands:

chkconfig ntpd on
service ntpd start

Setting up passwordless SSH

Get a pair of keys: ssh-keygen

The keys will be placed in the folder .ssh.

  • Copy the id_rsa file to /vagrant folder so that you can access the private key from the host machine as /vagrant is automatically the shared folder between host and guest OSs.
  • Also append, the public key to the authorized_keys keys file.

Setup Ambari

Download and copy the Ambari repository bits to /etc/yum.repos.d:

cp ambari.repo /etc/yum.repos.d

Double check that the repo has been configured correctly:
yum repolist

Now we are ready to install the bits from the repo:
yum install ambari-server

Now we can configure the bits. I just go with the defaults during the configuration:
ambari-server setup

Let’s spin up Ambari:
ambari-server start

Setting up the pseudo-cluster with Ambari:

Now you can access Ambari from your host machine at the url http://localhost:8080. The username and password is admin and admin respectively:

Name your cluster:

Select HDP 2.0:

Input the hostname of your VM and click on the Choose File button:

Select the private key file you can find in the folder you created at the beginning of this post:

Select the default options for the rest of the steps till you get to Customize Services. In this step, configure your preferred credentials especially for the components marked with a white number against the red background:

Finish up the wizard.

Voila!!! We have our very own Hadoop VM.

Happy Hadooping!


