Hortonworks and VMware have been working jointly for more than two years. We worked with VMware on the initial launch of Serengeti, on Apache Hadoop High Availability and on projects to do with validating and performance testing the Hortonworks Data Platform (HDP) software on the VMware vSphere platform. One of the results of this activity is that HDP has been a fully certified product on VMware vSphere version 5.1 and later.
Now the two companies are also working together to innovate in the area of ease of use for administrators of virtualized HDP clusters. This blog explains the features that are available for using Apache Ambari and VMware vCenter (with vSphere Big Data Extensions) in concert to cleanly provision, manage, and monitor your virtualized Hadoop clusters.
Apache Ambari is the open source tool of choice for Hadoop administrators and for architects who are examining the behavior/performance of their applications and clusters at runtime. VMware vCenter is the main tool used by vSphere/virtualization administrators to manage virtual machines and monitor the hosts and the various resources, such as storage and networking, which are consumed by those virtual machines.
Ideally, when a Hadoop cluster is virtualized, you can combine two views of the system: the Hadoop-level view in Ambari and the infrastructure-level view from vCenter. vSphere Big Data Extensions (BDE) delivers on the vision of a single unified view by integrating with Ambari. BDE plugs into vCenter as an added server and the experience enables administrators to design and create an HDP cluster through the familiar vCenter screens.
The process of building a virtualized HDP cluster has two separate steps: creating and configuring a set of virtual machines; and provisioning HDP on the available virtual machines.
The first step is to create a set of virtual machines with their guest operating systems, networking configured appropriately, correct users created and other appropriate services configured. Virtualization makes all of that easy through cloning – and the vCenter Web Client gives you a friendly way of doing that. Naturally, the various virtual machines can be differently sized for different node requirements within a Hadoop cluster.
Making the choices on where to place those virtual machines, once they are fully configured, is better done by the vCenter placement algorithms. Those types of configuration and VM-to-server host placement intelligence are built into the vSphere Big Data Extensions features. BDE takes care of cloning the right sizes and numbers of virtual machines for you. BDE makes it much easier to repeat this process where multiple clusters are needed – and reduces the occurrence of human operator error.
With a set of virtual machines available, the second step in the installation is to provision HDP software in various modes on the new virtual machines. You have three choices as to how accomplish this step:
That third option (leveraging Ambari Blueprints) is new in the latest release of VMware vSphere BDE. Using Blueprints, the BDE tool can now integrate with Ambari making the HDP provisioning process onto virtual machines a seamless one.
The BDE-Ambari integration works well for the user communities where an IT operations person or an architect is providing Hadoop-as-a-Service. For example, the architect, developer or QA testing person comes with a request to the administrator, perhaps with a specification of their desired cluster. The vSphere administrator can now carry out the provisioning task for them, without needing to know how to navigate through a second tool. Of course, a Hadoop administrator or developer who wants to generate a cluster of his/her own, provided the user has access to the appropriate rights in the vCenter environment, can deploy a HDP cluster through the vSphere Big Data Extensions as well.
Here is one view of the configuration of various virtual machine types (called “Node Groups” here) to handle the different roles in the HDP cluster. This is the key set of decisions made by the cluster designer to lay out their cluster. This can also be done in a more command line oriented way by asking the BDE CLI to create a cluster with a specification file that contains these details. More fine-grained distribution of the Hadoop roles out to separate virtual machines can be achieved in this way.
Once you have an Ambari Server instance installed in either a virtual or physical environment at your site (either choice is fine), you simply provide the Ambari REST API URL and server port number to BDE in a configuration step to create a new “Application Manager.” You can then go ahead and use BDE to configure and install HDP in an Ambari-driven way. In fact, you can watch the same actions being taken in either tool at cluster creation time and monitor the new cluster from both tools from there onward.
Hortonworks and VMware plan to continue to work together to enhance the user experience for provisioning virtualized HDP clusters and more innovations are in the works here – so watch this space!