Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 07, 2014
prev slideNext slide

How to build a Hadoop VM with Ambari and Vagrant

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.


Spin up a VM with Vagrant

Create a folder for this VM: mkdir hdp_vm

<Display Name>

If you have Virtual Box and Vagrant installed on your system, change directory to it and issue the following command:

vagrant box add hdp_vm

<Display Name>

Once it has completed the download and added to your library of VMs with the name hdp_vm, issue the command:

vagrant init hdp_vm

This will create a file ‘Vagrantfile’ in the folder. Open it in a text editor like ‘vi’:

<Display Name>

Edit the ‘Vagrantfile’, so that port 8080 on the VM is forwarded to port 8080 on the host:

<Display Name>

Let’s also modify the settings so that the VM is assigned adequate Memory once it is launched:

<Display Name>

We are ready to launch the VM. Once the VM is launched, SSH in and login as root and change to the home directory of the ‘root’:

<Display Name>

Configure the VM

Find out the default hostname of the VM and note it down:

<Display Name>

Then we need to edit the ‘/etc/hosts’ file so that we have an entry of this hostname. Open ‘/etc/hosts’ in ‘vi’ and it might look like this:

<Display Name>

It needs to looks like this:

<Display Name>

Now we will install the NTP service with the following commands:

yum install ntp

Next we will install the wget utility with the following commands:

yum install wget

Once these are installed turn on the ntp service with the commands:

chkconfig ntpd on
service ntpd start

<Display Name>

Setting up passwordless SSH

Get a pair of keys: ssh-keygen

<Display Name>

The keys will be placed in the folder .ssh.

  • Copy the id_rsa file to /vagrant folder so that you can access the private key from the host machine as /vagrant is automatically the shared folder between host and guest OSs.
  • Also append, the public key to the authorized_keys keys file.

<Display Name>

Setup Ambari

Download and copy the Ambari repository bits to /etc/yum.repos.d:

cp ambari.repo /etc/yum.repos.d

Double check that the repo has been configured correctly:
yum repolist

<Display Name>

Now we are ready to install the bits from the repo:
yum install ambari-server

<Display Name>

Now we can configure the bits. I just go with the defaults during the configuration:
ambari-server setup

<Display Name>

Let’s spin up Ambari:
ambari-server start

<Display Name>

Setting up the pseudo-cluster with Ambari:

Now you can access Ambari from your host machine at the url http://localhost:8080. The username and password is admin and admin respectively:

<Display Name>

Name your cluster:

<Display Name>

Select HDP 2.0:

<Display Name>

Input the hostname of your VM and click on the Choose File button:

<Display Name>

Select the private key file you can find in the folder you created at the beginning of this post:

<Display Name>

Select the default options for the rest of the steps till you get to Customize Services. In this step, configure your preferred credentials especially for the components marked with a white number against the red background:

<Display Name>

Finish up the wizard.

<Display Name>

Voila!!! We have our very own Hadoop VM.

Happy Hadooping!




  • I got all the way to the confirm hosts page and receive the error message:
    Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    lost connection
    scp /usr/lib/python2.6/site-packages/ambari_server/ done for host, exitcode=1
    Copying os type check script finished
    ERROR: Bootstrap of host fails because previous action finished with non-zero exit code (1)

    Any idea how I can either redo the SSH Key or ?


  • Thanks for this.
    Looks like this guest will only work from Ambari. With this current configuration and using NAT (default of Vagrantfile generated by vagrant init), you will not be able to submit jobs, write files to hdfs, etc that’s residing on the guest. You won’t even be able to go to port 8088 of the resource manager.
    I was only able to make this work by getting rid of the line from /etc/hosts altogether and enabling Bridge Networking by adding the following to the generate Vagrantfile: :public_network, bridge: “eth0”, adapter: 1

    This assumes v2 of the Vagrantfile configuration

  • When I make the changes to the vagrant file I get the error:

    rtinocos-MacBook-Pro:hdp_vm rtinoco$ vagrant up
    /Users/rtinoco/hdp_vm/Vagrantfile:52:in `block in ‘: undefined local variable or method `vb’ for main:Object (NameError)
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/v2/loader.rb:37:in `call’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/v2/loader.rb:37:in `load’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:104:in `block (2 levels) in load’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:98:in `each’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:98:in `block in load’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:95:in `each’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/config/loader.rb:95:in `load’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:265:in `config_global’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:519:in `block in action_runner’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/action/runner.rb:36:in `call’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/action/runner.rb:36:in `run’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:283:in `hook’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/lib/vagrant/environment.rb:139:in `initialize’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/bin/vagrant:105:in `new’
    from /Applications/Vagrant/embedded/gems/gems/vagrant-1.4.3/bin/vagrant:105:in `’
    from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `load’
    from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `’

    Any Ideas on how I can resolve this?

  • I had to hit the server with curl a few times before my local machine would connect to it.

    From within the VM:

    curl http://localhost:8080

    First request failed, second request worked. Now I can load localhost:8080 on my host machine.

  • Excellent article! I am glad this material was available! Very easy to follow along and I love using Vagrant! The only thing I think this guide might be missing is installing and starting httpd.
    I had to run the following:
    yum install -y httpd
    chkconfig httpd on
    service httpd start

    Other than that, everything went pretty smoothly!

  • It Looks good. I will try it at night.

    I have a mac air with 4g of ram. Can I ran this vagrant box well? How much ram do I need to ram this vagrantbox?

  • After the command ambari-server start
    you must turn off iptables to access the web server.

    chkconfig iptables off
    /etc/init.d/iptables stop

  • I’m running into a error on starting the ambari-server. I’m pasting a cutout from the ambari-server.log.

    Any ideas how to solve this problem?

    15:58:46,015 INFO [main] Configuration:350 – Reading password from existing file
    15:58:46,045 INFO [main] Configuration:530 – Hosts Mapping File null
    15:58:46,045 INFO [main] HostsMap:60 – Using hostsmap file null
    15:58:57,420 INFO [main] Configuration:429 – Credential provider creation failed. Reason: Master key initialization failed.
    15:58:58,970 INFO [main] AmbariServer:455 – Getting the controller
    15:59:03,993 INFO [main] CertificateManager:68 – Initialization of root certificate
    15:59:03,993 INFO [main] CertificateManager:70 – Certificate exists:true
    15:59:04,855 INFO [main] AmbariServer:125 – ********* Meta Info initialized **
    15:59:04,866 INFO [main] ClustersImpl:104 – Initializing the ClustersImpl
    15:59:10,117 ERROR [main] AmbariServer:465 – Failed to run the Ambari Server

    AmbariServer:465 – Failed to run the Ambari Server Guice provision errors:

    1) Error injecting constructor,
    at org.apache.ambari.server.bootstrap.BootStrapImpl.(

    Caused by:

  • My browser doesn’t open ambari page when i get to the – Setting up the pseudo-cluster with Ambari: stage.

    Any ideas on whats causing this?

  • Very nice tutorial. The only issue I’m encountering is that, in the AMBARI-SERVER SETUP step, the 81 MB JDK (.bin) stops downloading at some point (say, at 22%). So it’ll exit, I’ll try again, and it’ll stop downloading at a different point (say, at 37%).
    I try it repeatedly, but the best I’ve experienced is the download stopping at 90%. However, it’s the same error every time:

    ERROR: Exiting with exit code 1. Reason: Downloading or installing JDK failed: ‘Fatal exception: Size of downloaded JDK distribution file is XXXXXXX bytes, it is probably damaged or incomplete, exit code 1’. Exiting.

    Very disheartening because I was looking forward to getting through the entire tutorial.

  • Hi, do you know how this tutorial could be extended to a multi-node cluster on HDP 2.2? Is there another tutorial more appropriate to this subject of multi-node cluster creation? I understood that is quite a hassle to assemble a cluster from several VM sandboxes (although I am using the sandbox together with Teradata Aster and I am bound to a particular version, i.e. 2.2 and I struggled a lot to make them work together hence I am afraid that if I build by myself the cluster and the nodes I might run into integration issues)…so I am looking for the easiest way to build a small cluster of maximum 3 nodes (even better if anybody could point me to a place where I could download such already-built cluster in VMs, provided it’s a 2.2 HDP). Thanks John

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>