Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 07, 2014
prev slideNext slide

How to build a Hadoop VM with Ambari and Vagrant

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.


Spin up a VM with Vagrant

Create a folder for this VM: mkdir hdp_vm

<Display Name>

If you have Virtual Box and Vagrant installed on your system, change directory to it and issue the following command:

vagrant box add hdp_vm

<Display Name>

Once it has completed the download and added to your library of VMs with the name hdp_vm, issue the command:

vagrant init hdp_vm

This will create a file ‘Vagrantfile’ in the folder. Open it in a text editor like ‘vi’:

<Display Name>

Edit the ‘Vagrantfile’, so that port 8080 on the VM is forwarded to port 8080 on the host:

<Display Name>

Let’s also modify the settings so that the VM is assigned adequate Memory once it is launched:

<Display Name>

We are ready to launch the VM. Once the VM is launched, SSH in and login as root and change to the home directory of the ‘root’:

<Display Name>

Configure the VM

Find out the default hostname of the VM and note it down:

<Display Name>

Then we need to edit the ‘/etc/hosts’ file so that we have an entry of this hostname. Open ‘/etc/hosts’ in ‘vi’ and it might look like this:

<Display Name>

It needs to looks like this:

<Display Name>

Now we will install the NTP service with the following commands:

yum install ntp

Next we will install the wget utility with the following commands:

yum install wget

Once these are installed turn on the ntp service with the commands:

chkconfig ntpd on
service ntpd start

<Display Name>

Setting up passwordless SSH

Get a pair of keys: ssh-keygen

<Display Name>

The keys will be placed in the folder .ssh.

  • Copy the id_rsa file to /vagrant folder so that you can access the private key from the host machine as /vagrant is automatically the shared folder between host and guest OSs.
  • Also append, the public key to the authorized_keys keys file.

<Display Name>

Setup Ambari

Download and copy the Ambari repository bits to /etc/yum.repos.d:

cp ambari.repo /etc/yum.repos.d

Double check that the repo has been configured correctly:
yum repolist

<Display Name>

Now we are ready to install the bits from the repo:
yum install ambari-server

<Display Name>

Now we can configure the bits. I just go with the defaults during the configuration:
ambari-server setup

<Display Name>

Let’s spin up Ambari:
ambari-server start

<Display Name>

Setting up the pseudo-cluster with Ambari:

Now you can access Ambari from your host machine at the url http://localhost:8080. The username and password is admin and admin respectively:

<Display Name>

Name your cluster:

<Display Name>

Select HDP 2.0:

<Display Name>

Input the hostname of your VM and click on the Choose File button:

<Display Name>

Select the private key file you can find in the folder you created at the beginning of this post:

<Display Name>

Select the default options for the rest of the steps till you get to Customize Services. In this step, configure your preferred credentials especially for the components marked with a white number against the red background:

<Display Name>

Finish up the wizard.

<Display Name>

Voila!!! We have our very own Hadoop VM.

Happy Hadooping!




Carolus Holman says:
Your comment is awaiting moderation.

I got all the way to the confirm hosts page and receive the error message:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
lost connection
scp /usr/lib/python2.6/site-packages/ambari_server/ done for host, exitcode=1
Copying os type check script finished
ERROR: Bootstrap of host fails because previous action finished with non-zero exit code (1)

Any idea how I can either redo the SSH Key or ?


Ron Gonzalez says:
Your comment is awaiting moderation.

Thanks for this.
Looks like this guest will only work from Ambari. With this current configuration and using NAT (default of Vagrantfile generated by vagrant init), you will not be able to submit jobs, write files to hdfs, etc that’s residing on the guest. You won’t even be able to go to port 8088 of the resource manager.
I was only able to make this work by getting rid of the line from /etc/hosts altogether and enabling Bridge Networking by adding the following to the generate Vagrantfile: :public_network, bridge: “eth0”, adapter: 1

This assumes v2 of the Vagrantfile configuration

Anonimo says:
Your comment is awaiting moderation.

When I make the changes to the vagrant file I get the error:

rtinocos-MacBook-Pro:hdp_vm rtinoco$ vagrant up
/Users/rtinoco/hdp_vm/Vagrantfile:52:in `block in ‘: undefined local variable or method `vb’ for main:Object (NameError)
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/v2/loader.rb:37:in `call’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/v2/loader.rb:37:in `load’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:104:in `block (2 levels) in load’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:98:in `each’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:98:in `block in load’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:95:in `each’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:95:in `load’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:265:in `config_global’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:519:in `block in action_runner’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/action/runner.rb:36:in `call’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/action/runner.rb:36:in `run’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:283:in `hook’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:139:in `initialize’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/bin/vagrant:105:in `new’
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/bin/vagrant:105:in `’
from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `load’
from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `’

Any Ideas on how I can resolve this?

Sheldon Kreger says:
Your comment is awaiting moderation.

I had to hit the server with curl a few times before my local machine would connect to it.

From within the VM:

curl http://localhost:8080

First request failed, second request worked. Now I can load localhost:8080 on my host machine.

Brian de la Motte says:

Excellent article! I am glad this material was available! Very easy to follow along and I love using Vagrant! The only thing I think this guide might be missing is installing and starting httpd.
I had to run the following:
yum install -y httpd
chkconfig httpd on
service httpd start

Other than that, everything went pretty smoothly!

mike says:

It Looks good. I will try it at night.

I have a mac air with 4g of ram. Can I ran this vagrant box well? How much ram do I need to ram this vagrantbox?

Divya says:

After the command ambari-server start
you must turn off iptables to access the web server.

chkconfig iptables off
/etc/init.d/iptables stop

Malik says:
Your comment is awaiting moderation.

I’m running into a error on starting the ambari-server. I’m pasting a cutout from the ambari-server.log.

Any ideas how to solve this problem?

15:58:46,015 INFO [main] Configuration:350 – Reading password from existing file
15:58:46,045 INFO [main] Configuration:530 – Hosts Mapping File null
15:58:46,045 INFO [main] HostsMap:60 – Using hostsmap file null
15:58:57,420 INFO [main] Configuration:429 – Credential provider creation failed. Reason: Master key initialization failed.
15:58:58,970 INFO [main] AmbariServer:455 – Getting the controller
15:59:03,993 INFO [main] CertificateManager:68 – Initialization of root certificate
15:59:03,993 INFO [main] CertificateManager:70 – Certificate exists:true
15:59:04,855 INFO [main] AmbariServer:125 – ********* Meta Info initialized **
15:59:04,866 INFO [main] ClustersImpl:104 – Initializing the ClustersImpl
15:59:10,117 ERROR [main] AmbariServer:465 – Failed to run the Ambari Server

AmbariServer:465 – Failed to run the Ambari Server Guice provision errors:

1) Error injecting constructor,
at org.apache.ambari.server.bootstrap.BootStrapImpl.(

Caused by:

Mohsan says:
Your comment is awaiting moderation.

My browser doesn’t open ambari page when i get to the – Setting up the pseudo-cluster with Ambari: stage.

Any ideas on whats causing this?

Dan says:
Your comment is awaiting moderation.

How is this different than just using the Sandbox VM?

Sean says:

looks like a very nice tutorial, only problem is my company blocks github, is there a way around this issue?

Dan says:

Very nice tutorial. The only issue I’m encountering is that, in the AMBARI-SERVER SETUP step, the 81 MB JDK (.bin) stops downloading at some point (say, at 22%). So it’ll exit, I’ll try again, and it’ll stop downloading at a different point (say, at 37%).
I try it repeatedly, but the best I’ve experienced is the download stopping at 90%. However, it’s the same error every time:

ERROR: Exiting with exit code 1. Reason: Downloading or installing JDK failed: ‘Fatal exception: Size of downloaded JDK distribution file is XXXXXXX bytes, it is probably damaged or incomplete, exit code 1’. Exiting.

Very disheartening because I was looking forward to getting through the entire tutorial.

Srikanth says:

Facing issue Guice provision errors:Can you please help me here. It’s urgent

devops online training says:

Nice Article. How it help to developer in terms of balance the day to day life.

John says:

Hi, do you know how this tutorial could be extended to a multi-node cluster on HDP 2.2? Is there another tutorial more appropriate to this subject of multi-node cluster creation? I understood that is quite a hassle to assemble a cluster from several VM sandboxes (although I am using the sandbox together with Teradata Aster and I am bound to a particular version, i.e. 2.2 and I struggled a lot to make them work together hence I am afraid that if I build by myself the cluster and the nodes I might run into integration issues)…so I am looking for the easiest way to build a small cluster of maximum 3 nodes (even better if anybody could point me to a place where I could download such already-built cluster in VMs, provided it’s a 2.2 HDP). Thanks John

calfre says:

nice blog thank you for sharing sap-hr abap training

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums