Home Forums HDP on Linux – Installation Node discovery and preparation fails to find all nodes

This topic contains 9 replies, has 2 voices, and was last updated by  Sasha J 1 year, 12 months ago.

  • Creator
    Topic
  • #7644

    I have a five node cluster and during the add nodes part of the HDP installation it fails to find 4 of the 5 nodes. It only finds the node where I have installed the HDP. I am able to ssh to all the nodes in the cluster as root. The hostnames all work via DNS from the install node. I even added the host names to the /etc/hosts file and tried IPs and I get the same error for the 4 nodes.

    Failed. Reason: ssh: hadoop02.dev.corp.oversee.net
    : Name or service not known

    Failed. Reason: ssh: hadoop03.dev.corp.oversee.net
    : Name or service not known

    Failed. Reason: ssh: hadoop04.dev.corp.oversee.net
    : Name or service not known

Viewing 9 replies - 1 through 9 (of 9 total)

The topic ‘Node discovery and preparation fails to find all nodes’ is closed to new replies.

  • Author
    Replies
  • #7711

    Sasha J
    Moderator

    Juan,

    I’m told that someone from POC support has contacted you. Your version of CentOS 5.5 requires some manual installation of some packages. The person who contacts you will assist you further

    Sasha

    Collapse
    #7684

    It was a clean install.

    Collapse
    #7683

    [jsandoval@hadoop05 ~]$ sudo /usr/local/sbin/memconf
    memconf: V2.22 30-Jan-2012 http://www.4schmidts.com/unix.html
    hostname: hadoop05.dev.corp.oversee.net
    Dell Inc. PowerEdge R610 (2 X Six-Core Hyper-Threaded Intel(R) Xeon(R) X5690 @ 3.47GHz)
    Memory Error Correction: Multi-bit ECC
    Maximum Memory: 196608MB (192GB)
    DIMM_A1: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    DIMM_A2: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    DIMM_A3: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    DIMM_B1: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    DIMM_B2: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    DIMM_B3: 8192MB 1333MHz Synchronous DDR3 DIMM, Hynix Semiconductor (Hyundai Electronics) HMT31GR7BFR4A-H9
    empty memory sockets: DIMM_A4, DIMM_A5, DIMM_A6, DIMM_B4, DIMM_B5, DIMM_B6
    total memory = 49152MB (48GB)

    [jsandoval@hadoop05 ~]$ cat /etc/redhat-release
    CentOS release 5.5 (Final)

    [jsandoval@hadoop05 ~]$ uname -a
    Linux hadoop05.dev.corp.oversee.net 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

    Collapse
    #7664

    Sasha J
    Moderator

    Hi Juan,

    can you please post your OS version, whether it is a clean install, and any modifications you have made to it, since it was cleanly installed?

    -Sasha

    Collapse
    #7657

    I just sent you an email with my contact information. Thanks for your help.

    Collapse
    #7654

    Sasha J
    Moderator

    Hi Juan

    Thank you for taking the time to investigate these issues with us.

    Would you have some time to do a webex?

    if so please send your contact information to POC-SUPPORT@HORTONWORKS.COM and we will follow up with you

    Thanks in advance,

    Sasha

    Collapse
    #7647

    Now I am getting the following error. A negative node?

    Finding reachable nodes: -1 / 5 in progress; 6 failed

    Collapse
    #7646

    I am able to ssh to all of the nodes with no password. The hmc.log shows the following error.

    [2012:07:26 21:37:20][INFO][HMCTxnUtils][HMCTxnUtils.php:116][execBackgroundProcess]: Found child pid, command=/usr/bin/php ./addNodes/findSshableNodes.php, txnId=1, output=Executing /usr/bin/php ./addNodes/findSshableNodes.php metroid root 1 100 2 /var/run/hmc/clusters/metroid/hosts.txt > /var/log/hmc/hmc.txn.1.log 2>&1
    Background Child Process PID:15911
    , pid=15911
    [2012:07:26 21:37:20][INFO][findSshableNodes][commandUtils.php:76][runPdsh]: Hosts for this operation: “\/var\/run\/hmc\/clusters\/metroid\/hosts.txt”
    [2012:07:26 21:37:20][INFO][findSshableNodes][commandUtils.php:80][runPdsh]: Going to execute findSshableNodes : pdsh -R exec /var/run/hmc/clusters/metroid/findSshableNodes//ssh.sh %h
    [2012:07:26 21:37:20][INFO][findSshableNodes][commandUtils.php:7][launchCmd]: Env variable WCOLL is “\/var\/run\/hmc\/clusters\/metroid\/hosts.txt”
    [2012:07:26 21:37:20][INFO][findSshableNodes][findSshableNodes.php:121][]: Going to persist information sshAble nodes
    [2012:07:26 21:37:20][INFO][Add nodes poller][nodesActionProgress.php:33][]: Cluster Name: metroid Root Txn ID: 1
    [2012:07:26 21:37:21][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:251][]: Encountered total failure in transaction 100 while running cmd: /usr/bin/php ./addNodes/findSshableNodes.php with args: metroid root 1 100 2 /var/run/hmc/clusters/metroid/hosts.txt

    Collapse
    #7645

    Sasha J
    Moderator

    Hi Juan,

    All the nodes must be resolvable from the host running the HMC process.

    if you are certain that the entries in your host file that you uploaded exactly match the host names in your /etc/hosts on ALL hosts, then your name resolution service may be configured to hit an external DNS source BEFORE the local /etc/hosts

    a good test is to manually ssh into the host running HMC

    then manually ssh into each target node and ensure you can do so with the same name in the /etc/hosts

    and with no password

    -Sasha

    Collapse
Viewing 9 replies - 1 through 9 (of 9 total)