The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

HMC with virtual nodes

  • #12274
    John Burnett

    I would like to build the following configuration: 1) HMC node on a physical server (includes NFS export KVM datastore) 2) separate physical server with 2 KVM virtual nodes, one Namenode and one data node. The KVM image for the Namenode resides on the NFS directory that is exported from the HMC server. This would allow me to test out a cluster, and be able to use KVM live migration on the Namenode to another KVM server. Is this doable? And if so, what installation method would be best to employ? If I use the HMC install method, can I direct Namenode and Datanode roles to specific (virtual) servers? Or can I use GSinstaller and and add HMC at a later date? This is for development purposes only, thanks, RJB

  • Author
  • #12331
    Seth Lyubich

    Hi John,

    In theory this setup should be possible if your virtual machines meet minimum requirements, able to resolve all hosts on network correctly, ssh to each other, etc. Please refer to for minimum hardware and software requirements when you build your virtual machines. Please note that this configuration was never tested on our side and not ideal.

    Here are some more additional notes regarding questions that you posted:

    >> If I use the HMC install method, can I direct Namenode and Datanode roles to specific (virtual) servers?
    Yes, as long as HMC is able to find all nodes. There is a limitation where you not able to run datanode on HMC server.

    >> Or can I use GSinstaller and and add HMC at a later date?
    This is not possible at this time.

    Hope this helps.


    Steve Loughran

    I think the live migration of the Namenode is something you would need to “tread carefully” on. It is very much something that’s not been tested and is a key risk point.

    We have done a lot of work and testing on using Linux HA for handling NN failover -this uses a floating IP address and will mount and remount the NFS drive as it moves the process.

    * you can bring up a Linux HA cluster as VMs -it’s one way we did a lot of testing of failures.

    * you must have a “STONITH” mechanism for one of the VMs to kill the other one if they ever lose contact. Normally that’s a network/serial port addressable power supply. In the VM world, a shell script that sshexec’s a command on the host server to power off the other VM can be used.

    Even in this world you have to make sure that switches and routing keeps up with the floating IP address -the same problem you have with live VM migration. Keep all the failover nodes on the same switch, so that the ARP address for off-switch systems stays the same, and hope that ARP refresh message gets round all the hosts in the local switch.

    HA namenode for HDFS 1,
    High Availability Hadoop

The forum ‘HDP on Linux – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.