The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

How to configure multi node cluster

  • #43839
    Durga Prasad


    Can anyone please provide how to set up multi node cluster?


  • Author
  • #43856
    Robert Molina

    Hi Durga,
    Have you looked into using HDP’s Ambari product to setup a multi node cluster. Here is documentation that have steps of how to do so.


    Vidy G

    I am trying to set up a two node cluster using HDP 2.0 sandbox. I believe we need to use two different VM or physical machine to set up a 2 node cluster. Is it correct?

    I set up a sandbox VM and cloned it to create a second VM. I enabled Ambari in sandbox 1 to configure the sandbox2 as the second node in the cluster. But Ambari failed to register the second sandbox. The log file said issues with host name. I tried to modify host-name of second VM with no luck. Has anyone tried this before? If so what will be a simple way of setting up a 2 node cluster of HDP ?

    Son Hai Ha

    I hope this can help. I summary the manual guide here:

    The bellow process is described under the case of installing Ambari 1.5.1 on a cluster of VMs in Open Stack, and there are some ports and resource websites blocked by the company firewall. The VMs running Ambari are using the standard “CentOS 6.4 minimal” image. We intended to install Hadoop 1.3.3 on the cluster.

    + Edit the file /etc/hosts in all hosts to use fully qualified domain name, append the record to end of files like this:
    ###.###.###.### hostname node1.hadoop.test node1 node2.hadoop.test node2

    so that nodes can ping each other by hostname.

    + Edit hostname for each node:
    vi /etc/sysconfig/network


    + Disable iptables for ambari on all hosts
    chkconfig iptables off
    /etc/init.d/iptables stop

    + Disable SELinux all on all hosts
    setenforce 0

    + Set umask value on all host
    umask 022

    + Running NTP server on all hosts
    yum install ntp ntpdate ntp-doc (install)
    chkconfig ntpd on (turn on service)
    ntpdate (update time)
    /etc/init.d/ntpd start (start server)

    + Disable ipv6 (optional, in case ambari-server listen on IPv6 port)
    sysctl -w net.ipv6.conf.all.disable_ipv6=1
    sysctl -w net.ipv6.conf.default.disable_ipv6=1

    +Setting up your local repository (optional, if ambari server could not connect to Hortonwork Repositories)
    ++Install Apache Webserver:
    yum install httpd
    /etc/init.d/httpd start

    ++Download HDP packages at:
    yum install yum-utils createrepo
    mkdir -p /var/www/html/
    cd /var/www/html/

    untar the file here

    – Open port 8440 and 8441 in security group, otherwise ambari agent couldn’t register to ambari server.
    – Open port 2181, 2888, 3888 for ZooKeeper
    – Open port 60000, 60010, 60020, and 60030 for HBase
    – Open port 50111 for WebHCat
    – Open port 50070, 50470, 8020, 9000, 50075, 50475, 50010, 50020, and 50090 for HDFS
    – Open port 51111, 19888, 50060, 50030, 9021 for MapReduce (13562 and 50300 not specified in the manual guide but should be opened)
    – Open port 10000 and 9083 for Hive

    Run Ambari Server Setup
    ambari-server setup

    Start Ambari Server
    ambari-server start

    Access to Ambari web:
    Follow the wizards to create your cluster.
    They will ask for the list of nodes that you want to setup, use their FQDN to enter.

    Prem Kumar


    I setup the two node clusters and all the services up and running I can see all the service has green color.But my question is If i click the metrics button i can see the following
    Disk usage : n/a
    Datanodes Live : 1/1
    Namenode & SecondaryNamenode : 1 Databode
    Memory Usage : There was no data available.Possible reason including inaccessible Ganglia Service
    Network Usage : There was no data available.Possible reason including inaccessible Ganglia Service
    CPU Usage : There was no data available.Possible reason including inaccessible Ganglia Service
    Cluster Load : There was no data available.Possible reason including inaccessible Ganglia Service
    Namenode Heap : n/a
    Namenode RPC : n/a
    Namenode CPU WIO : n/a
    Namenode Uptime : n/a
    Namenode Master Heap : n/a
    Hbase Links: No active Master, 1 regionserver, n/a
    HBase Avg Load : n/a
    HBase Master Uptime : n/a
    Resource Manager Heap : n/a
    Resource Manager uptime : n/a
    NoadManagers Live : 1/1
    Yarn Memory : n/a
    Supervisors Live : 1/1

    How to get all the values for the metrics ?

    Son Hai Ha

    Hi Kumar,
    Did you also install Nagios and Ganglia service on the cluster? Those services report the usage metrics. Just make sure Ganglia monitor at each node is running and Ganglia Server is running to receive the report.
    Please make sure these ports are not blocked: TCP 8625, 8552, 8649, 8651, 8652, 8655, 8656, 8658, 8659, 8660, 8661, 8662, 8663, 8666 and UDP 6343, 8649, 8656, 8658, 8659, 8660, 8661, 8662, 8663, 8666 for Ganglia (most of the ports are not mentioned in the manual)
    Sincerely yours,

    Prem Kumar

    Hi Son Hai Ha

    appologies for a late reply

    can you kindly let me how could i check ports are not blocked
    even if i disable the firewall yet to check about the ports are not blocked
    please clarify


    vikash pandey


    Supported OS is ubuntu 12.04 for multinode cluster using Ambari1.7 and HDP2.2

    Follow the steps below
    echo “os Check”
    cat /etc/issue

    echo “program check”

    whereis rpm
    whereis scp
    whereis curl
    whereis unzip
    whereis tar
    whereis wget
    whereis openssl
    whereis python

    echo “execute hostname and then nslookup hostname to verify that the name resolves to the correct IP address”
    nslookup hostname

    In /etc/hosts add IP FQDN
    sudo /etc/hosts

    echo “set up ulimit”
    oprn file $vi /etc/security/limits.conf

    add following lines to the end of file
    * soft nofile 10000
    * hard nofile 10000

    Then logout and re-login


    copy to hosts

    cat .ssh/ >> .ssh/authorized_keys
    ssh root@f.q.d.n

    Alternative to chkconfig
    sudo apt-get install ntp
    sudo apt-get install sysv-rc-conf
    sudo sysv-rc-conf ntpd on
    sysv-rc-conf –list
    sysv-rc-conf –list ntpd

    echo “set umask”
    umask 022

    in .bashrc file set umask 022

    For getting key in ubuntu12

    Ambari 1.7.0 Repository File Links:for UBUNTU 12

    wget -nv
    wget -nv -O /etc/apt/sources.list.d/ambari.list

    HDP 2.2 repository file links for UBUNTU 12

    wget -nv -O /etc/apt/sources.list.d/HDP.list

    sudo apt-key adv –recv-keys –keyserver hkp:// B9733A7A07513CAD

    apt-get update

    apt-cache pkgnames

    apt-get install ambari-server

    savita sheoran

    i have installed virtual box 4.3 and imported sandbox in it but after boot if try to login with commnd
    ssh root@ -p 2222;
    geting response connection refused why its plz reply as soon as possible

The forum ‘HDP on Linux – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.