How to run Hadoop on OpenStack with Hortonworks Sandbox

The Hortonworks Sandbox is a great tool for not only learning Hadoop, but also for experimentation and application development.  Deployment in a type 2 hypervisor such as Oracle VirtualBox or VMWare Workstation is straightforward and serves the need for a single user. Sandbox can also be deployed to IaaS environments, and in this case, we walk through the steps of deploying Hortonworks Sandbox on OpenStack. For the purposes of this article, the author has used OpenStack Grizzly release running QEMU-KVM as the underlying hypervisor. Since QEMU_KVM does not directly support VMDK images, the Sandbox VMDK image must be converted to a supported format; in this case we will use a qcow2 format.

Pre-Requisites:

The approach described in this article makes use of both Oracle VirtualBox and qemu-img. Requirements include:

  • Oracle VirtualBox 4.x, must be installed
  • qemu-img conversion tool (likely available in the OpenStack environment)
  • OpenStack installation (versions Essex, Folsom or Grizzly should work fine) with access to nova and glance.

Step 1: Download Hortonworks Sandbox

Download the Hortonworks Sandbox 1.3 (VirtualBox). The end result of this step should be a file named: Hortonworks+Sandbox+1.3+VirtualBox+RC6.ova

Step 2: Unzip the downloaded .ova file

Unzipping the image can be done in multiple way, including winzip, rar and tar. Executing the following tar command will do the trick:

tar –xvf Hortonworks+Sandbox+1.3+VirtualBox+RC6.ova

The end result will be two files shown in the below screenshot:

Install Hadoop on OpenStack

The file of importance is the Hortonworks-Sandbox-1.3-VirtualBox-disk1.vmdk file which contains the Hortonworks Sandbox disk image that will be converted to the appropriate format in the steps below.

Step 3: Convert the .vmdk disk image to qcow2

In a perfect world, the VMDK image could be converted directly to a qcow2 or a raw image.  In my case, qemu-img did not support the VMDK format. However, this may vary by installation. To find out which formats are supported by qemu-img, issue the “qemu-img” command to find out (formats are listed at the bottom of the help message). Therefore, I am required to first convert the VMDK file to VDI format and then convert to qcow2. The VBoxManage command will take care of reformatting the VMDK file into VMI format:

VBoxManage clonehd Hortonworks-Sandbox-1.3-VirtualBox-disk1.vmdk Hortonworks-Sandbox-1.3.vdi –format VDI

The output of the above command is Hortonworks-Sandbox-1.3.vdi which can now be converted to qcow2 format. Note: If VBoxManage complains about an incorrect UUID for the image, this means that the image is already registered with VirtualBox and the image must be unregistered from VirtualBox using the Virtual Media Manager (simply remove the .VMDK image but keep the file on disk).

In order to convert the Hortonworks-Sandbox-1.3.vdi file to qcow2, qemu-img is used:

qemu-img convert -O qcow2 Hortonworks-Sandbox-1.3.vdi Hortonworks-Sandbox-1.3.qcow2

Step 4: Register image with OpenStack Glance

The Glance OpenStack service provides mechanisms to register, store and retrieve VM images and meta-data and is the service that we will use to make the Hortonworks Sandbox an image that is available for boot within your OpenStack environment. Assuming that you have set the credentials in your OpenStack environment properly set, issue the following command to register the image with Glance:

glance image-create --name Hortonworks-Sandbox-1.3 --is-public=true --container-format=bare --disk-format qcow2 < Hortonworks-Sandbox-1.3.qcow2

Note that the above command can be executed on any host where the OpenStack glance python client has been installed.

Step 5: Boot Sandbox

Since the Hortonworks Sandbox uses an expandable file system which could get large, it would be best to use a flavor that provisions enough ephemeral disk space to satisfy this requirement. In a default OpenStack setup, this is a m1.large flavor. To boot the image, use the OpenStack Horizon UI or execute the following command from the command line:

nova boot --flavor m1.large --image Hortonworks-Sandbox-1.3 --key-name default mySandbox

The below screenshot shows the output of the above command.

Install Hadoop on OpenStack

Note that the --key-name argument above will need to change based on which key-pair you want to use to boot your image, only necessary if you want to be able to use password-less ssh to the Hortonworks Sandbox.

Step 6: Access Sandbox

In order to access Hortonworks Sandbox through ssh and http, you must ensure that security group assigned to the image opens up the following ports: 22 (for ssh access), 8888 (for Sandbox itself) and 8000 (for Hue).

Lastly, in order to access the instance, a public IP address must be made available. Depending on the OpenStack configuration, this is may be automatically performed upon nova provisioning. However, typically this is done by assigning a floating IP address and using the assigned address to access Sandbox. e.g. http://floating_ip:8888

Assigning a floating IP address can be done in one of two ways: through the command line using the nova python client or through the Horizon UI. It is also possible that floating IP addresses are assigned automatically upon instance provisioning in which case there is no extra step to be completed.

Floating IP assignment via nova python client

When assigning a floating IP to an instance, it is first necessary to get a list of available floating IPs which can be done by executing the following command:

Install Hadoop on OpenStack

Select any of the available Floating IPs and assign it to your Sandbox instance:

nova add-floating-ip  MySandbox 172.18.3.1

Floating IP assignment via Horizon console

For those more comfortable with a graphical means of working with OpenStack, assigning a public IP address can easily be accomplished through the Horizon UI, by navigating to the instances menu and selecting Associate Floating IP under the “More…” menu for the Sandbox instance as illustrated in the below screenshot:

Install Hadoop on OpenStack

Final Thought

Making Hortonworks Sandbox available in your OpenStack environment is a great way of quickly creating instances of Hortonworks Sandbox for a large number of users without requiring a desktop virtualization solution. Have fun!

A note on tutorials: Some Hortonworks Sandbox tutorials make mention of specific IP addresses. These IP addresses, when referring to a Hortonworks Sandbox (e.g. 127.0.0.1 or 172.16.124.128) IP address must be substituted with the assigned public IP address of the OpenStack instance.

Categorized by :
Architecture OpenStack Sandbox

Comments

Karthik Krishnamoorthy
|
September 23, 2013 at 9:44 am
|

This is great for a single node cluster. How do I do this for a multi node cluster – especially if you want setup dev vm’s for your developers and dev environments

    |
    October 3, 2013 at 8:25 am
    |

    You can provision the hardware required using the Nova API and then provision software elements using Ambari…

    HTH,

    Bruce

|
August 15, 2013 at 2:50 pm
|

You can also use Project Savanna to accomplish the implementation. Their implementation includes a decomposed image. HP Cloud Services has tested that approach successfully.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :