How to run Hadoop on OpenStack with Hortonworks Sandbox
The Hortonworks Sandbox is a great tool for not only learning Hadoop, but also for experimentation and application development. Deployment in a type 2 hypervisor such as Oracle VirtualBox or VMWare Workstation is straightforward and serves the need for a single user. Sandbox can also be deployed to IaaS environments, and in this case, we walk through the steps of deploying Hortonworks Sandbox on OpenStack. For the purposes of this article, the author has used OpenStack Grizzly release running QEMU-KVM as the underlying hypervisor. Since QEMU_KVM does not directly support VMDK images, the Sandbox VMDK image must be converted to a supported format; in this case we will use a qcow2 format.
The approach described in this article makes use of both Oracle VirtualBox and qemu-img. Requirements include:
- Oracle VirtualBox 4.x, must be installed
- qemu-img conversion tool (likely available in the OpenStack environment)
- OpenStack installation (versions Essex, Folsom or Grizzly should work fine) with access to nova and glance.
Step 1: Download Hortonworks Sandbox
Download the Hortonworks Sandbox 1.3 (VirtualBox). The end result of this step should be a file named:
Step 2: Unzip the downloaded .ova file
Unzipping the image can be done in multiple way, including winzip, rar and tar. Executing the following tar command will do the trick:
tar –xvf Hortonworks+Sandbox+1.3+VirtualBox+RC6.ova
The end result will be two files shown in the below screenshot:
The file of importance is the
Hortonworks-Sandbox-1.3-VirtualBox-disk1.vmdk file which contains the Hortonworks Sandbox disk image that will be converted to the appropriate format in the steps below.
Step 3: Convert the .vmdk disk image to qcow2
In a perfect world, the VMDK image could be converted directly to a qcow2 or a raw image. In my case, qemu-img did not support the VMDK format. However, this may vary by installation. To find out which formats are supported by qemu-img, issue the “qemu-img” command to find out (formats are listed at the bottom of the help message). Therefore, I am required to first convert the VMDK file to VDI format and then convert to qcow2. The VBoxManage command will take care of reformatting the VMDK file into VMI format:
VBoxManage clonehd Hortonworks-Sandbox-1.3-VirtualBox-disk1.vmdk Hortonworks-Sandbox-1.3.vdi –format VDI
The output of the above command is Hortonworks-Sandbox-1.3.vdi which can now be converted to qcow2 format. Note: If VBoxManage complains about an incorrect UUID for the image, this means that the image is already registered with VirtualBox and the image must be unregistered from VirtualBox using the Virtual Media Manager (simply remove the .VMDK image but keep the file on disk).
In order to convert the Hortonworks-Sandbox-1.3.vdi file to qcow2, qemu-img is used:
qemu-img convert -O qcow2 Hortonworks-Sandbox-1.3.vdi Hortonworks-Sandbox-1.3.qcow2
Step 4: Register image with OpenStack Glance
The Glance OpenStack service provides mechanisms to register, store and retrieve VM images and meta-data and is the service that we will use to make the Hortonworks Sandbox an image that is available for boot within your OpenStack environment. Assuming that you have set the credentials in your OpenStack environment properly set, issue the following command to register the image with Glance:
glance image-create --name Hortonworks-Sandbox-1.3 --is-public=true --container-format=bare --disk-format qcow2 < Hortonworks-Sandbox-1.3.qcow2
Note that the above command can be executed on any host where the OpenStack glance python client has been installed.
Step 5: Boot Sandbox
Since the Hortonworks Sandbox uses an expandable file system which could get large, it would be best to use a flavor that provisions enough ephemeral disk space to satisfy this requirement. In a default OpenStack setup, this is a m1.large flavor. To boot the image, use the OpenStack Horizon UI or execute the following command from the command line:
nova boot --flavor m1.large --image Hortonworks-Sandbox-1.3 --key-name default mySandbox
The below screenshot shows the output of the above command.
Note that the
--key-name argument above will need to change based on which key-pair you want to use to boot your image, only necessary if you want to be able to use password-less ssh to the Hortonworks Sandbox.
Step 6: Access Sandbox
In order to access Hortonworks Sandbox through ssh and http, you must ensure that security group assigned to the image opens up the following ports: 22 (for ssh access), 8888 (for Sandbox itself) and 8000 (for Hue).
Lastly, in order to access the instance, a public IP address must be made available. Depending on the OpenStack configuration, this is may be automatically performed upon nova provisioning. However, typically this is done by assigning a floating IP address and using the assigned address to access Sandbox. e.g. http://floating_ip:8888
Assigning a floating IP address can be done in one of two ways: through the command line using the nova python client or through the Horizon UI. It is also possible that floating IP addresses are assigned automatically upon instance provisioning in which case there is no extra step to be completed.
Floating IP assignment via nova python client
When assigning a floating IP to an instance, it is first necessary to get a list of available floating IPs which can be done by executing the following command:
Select any of the available Floating IPs and assign it to your Sandbox instance:
nova add-floating-ip MySandbox 172.18.3.1
Floating IP assignment via Horizon console
For those more comfortable with a graphical means of working with OpenStack, assigning a public IP address can easily be accomplished through the Horizon UI, by navigating to the instances menu and selecting Associate Floating IP under the “More…” menu for the Sandbox instance as illustrated in the below screenshot:
Making Hortonworks Sandbox available in your OpenStack environment is a great way of quickly creating instances of Hortonworks Sandbox for a large number of users without requiring a desktop virtualization solution. Have fun!
A note on tutorials: Some Hortonworks Sandbox tutorials make mention of specific IP addresses. These IP addresses, when referring to a Hortonworks Sandbox (e.g. 127.0.0.1 or 172.16.124.128) IP address must be substituted with the assigned public IP address of the OpenStack instance.