How to build a Hadoop VM with Ambari and Vagrant

Hadoop Inception.

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.


Spin up a VM with Vagrant

Create a folder for this VM: mkdir hdp_vm

<Display Name>

If you have Virtual Box and Vagrant installed on your system, change directory to it and issue the following command:

vagrant box add hdp_vm

<Display Name>

Once it has completed the download and added to your library of VMs with the name hdp_vm, issue the command:

vagrant init hdp_vm

This will create a file ‘Vagrantfile’ in the folder. Open it in a text editor like ‘vi’:

<Display Name>

Edit the ‘Vagrantfile’, so that port 8080 on the VM is forwarded to port 8080 on the host:

<Display Name>

Let’s also modify the settings so that the VM is assigned adequate Memory once it is launched:

<Display Name>

We are ready to launch the VM. Once the VM is launched, SSH in and login as root and change to the home directory of the ‘root’:

<Display Name>

Configure the VM

Find out the default hostname of the VM and note it down:

<Display Name>

Then we need to edit the ‘/etc/hosts’ file so that we have an entry of this hostname. Open ‘/etc/hosts’ in ‘vi’ and it might look like this:

<Display Name>

It needs to looks like this:

<Display Name>

Now we will install the NTP service with the following commands:

yum install ntp

Next we will install the wget utility with the following commands:

yum install wget

Once these are installed turn on the ntp service with the commands:

chkconfig ntpd on
service ntpd start

<Display Name>

Setting up passwordless SSH

Get a pair of keys: ssh-keygen

<Display Name>

The keys will be placed in the folder .ssh.

  • Copy the id_rsa file to /vagrant folder so that you can access the private key from the host machine as /vagrant is automatically the shared folder between host and guest OSs.
  • Also append, the public key to the authorized_keys keys file.

<Display Name>

Setup Ambari

Download and copy the Ambari repository bits to /etc/yum.repos.d:

cp ambari.repo /etc/yum.repos.d

Double check that the repo has been configured correctly:
yum repolist

<Display Name>

Now we are ready to install the bits from the repo:
yum install ambari-server

<Display Name>

Now we can configure the bits. I just go with the defaults during the configuration:
ambari-server setup

<Display Name>

Let’s spin up Ambari:
ambari-server start

<Display Name>

Setting up the pseudo-cluster with Ambari:

Now you can access Ambari from your host machine at the url http://localhost:8080. The username and password is admin and admin respectively:

<Display Name>

Name your cluster:

<Display Name>

Select HDP 2.0:

<Display Name>

Input the hostname of your VM and click on the Choose File button:

<Display Name>

Select the private key file you can find in the folder you created at the beginning of this post:

<Display Name>

Select the default options for the rest of the steps till you get to Customize Services. In this step, configure your preferred credentials especially for the components marked with a white number against the red background:

<Display Name>

Finish up the wizard.

<Display Name>

Voila!!! We have our very own Hadoop VM.

Happy Hadooping!


Categorized by :
Administrator Ambari Big Data Data Analyst & Scientist Developer HDP HDP 2 Operations & Management Sandbox

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Using Ambari for Management
Thursday, September 4, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.