Home Forums HDP on Linux – Installation adding new nodes

Tagged: 

This topic contains 20 replies, has 4 voices, and was last updated by  Sanjeev 2 years, 2 months ago.

  • Creator
    Topic
  • #6596

    Adding new nodes fails with the following error

    # cat hmc.txn.4.log
    PHP Warning: fsockopen(): unable to connect to ip-10-0-1-44.c2hpc.com:8139 (Connection refused) in /usr/share/hmc/php/frontend/addNodes/finalizeNodes.php on line 34
    PHP Warning: fsockopen(): unable to connect to ip-10-0-1-44.c2hpc.com:8139 (Connection refused) in /usr/share/hmc/php/frontend/addNodes/finalizeNodes.php on line 34
    PHP Warning: fsockopen(): unable to connect to ip-10-0-1-44.c2hpc.com:8139 (Connection refused) in /usr/share/hmc/php/frontend/addNodes/finalizeNodes.php on line 34

    I am trying to add one node to the cluster. Both machines are in amazon
    After the installation failed I am getting the following error while trying for the second time.
    “Some hosts in the given file are already being used in cluster”

    Any db where this data is getting updated. Nothing is present in mysql

    details in hmc.log
    [2012:07:03 09:32:10][INFO][Add nodes poller][nodesActionProgress.php:33][]: Cluster Name: gtocluster Root Txn ID: 4
    [2012:07:03 09:32:10][ERROR][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:122][sign_and_verify_agent]: Timed out waiting for all puppet agents to ping master
    [2012:07:03 09:32:10][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:126][sign_and_verify_agent]: Re-checking to ensure all puppet hosts are signed
    [2012:07:03 09:32:22][ERROR][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:204][sign_and_verify_agent]: Failed to ping puppet agent on host [ip-10-0-1-44.c2hpc.com]: Connection refused
    [2012:07:03 09:32:22][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:235][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=0, failedHostsCount=1
    [2012:07:03 09:32:22][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:255][sign_and_verify_agent]: Completed sign/verify puppet agent for 1 nodes, result=Array
    (
    [ip-10-0-1-44.c2hpc.com] => Array
    (
    [discoveryStatus] => FAILED
    [badHealthReason] => Puppet agent ping failed: , error=111, outputLogs=Puppet agent ping failed: [Connection refused]
    )

    )
    [2012:07:03 09:32:22][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:369][]: Puppet finalize, succeeded for 0 and failed for 1 of total 1 hosts

    Any idea………..

Viewing 20 replies - 1 through 20 (of 20 total)

The topic ‘adding new nodes’ is closed to new replies.

  • Author
    Replies
  • #7891

    Sanjeev
    Participant

    Thanks Miguel. In that case I’m setting up my multi node cluster as you described above.

    regards
    Sanjeev

    Collapse
    #7890

    Sanjeev, Im not familiar with VMWare, might want to contact a hortonwork dev for help. You can find how in various threads..
    The way you set up a multi node cluster with HMC just involves configuring another machine the same way as the first except for the installation of hmc. ( You only need this on your deployment node ) and making sure the deployment node can assess the other machine through ssh. Include the fqdn in your hosts.txt file and viola.

    Cheers,
    Miguel

    Collapse
    #7889

    Sanjeev
    Participant

    Hi Miguel,

    Thanks for your response.

    1) I’m using a VMWare server to host my virtual machines.
    2) hostname -f returns expected results ==>[root@cisvltsvm-02 ~]# hostname -f
    cisvltsvm-02.svl.ibm.com
    3) Format of etc/hosts is as below:
    9.30.192.102 cisvltsvm-02.svl.ibm.com cisvltsvm-02
    9.30.192.100 cisvltsvm-01.svl.ibm.com cisvltsvm-01
    4) I configured static IP’s using instructions @ http://www.cyberciti.biz/faq/rhel-centos-fedoracore-linux-network-card-configuration/

    The assigned host names are already registered with my DNS server, so they resolve properly.
    After all this I get below stated error:

    [2012:07:31 03:33:44][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:261][]: Got error while getting hostInfo for ÿþc[2012:07:31 03:33:45][INFO][HMCTxnUtils][HMCTxnUtils.php:82][execBackgroundProcess]: Trying to background a new process, cluster=sanju, txnId=3, command=/usr/bin/php ./addNodes/obtainNodesInfo.php, args=sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt, logFile=/var/log/hmc/hmc.txn.3.log, execCommand=/usr/bin/php /usr/share/hmc//php/util/BackgroundExecutor.php -t “3” -c “/usr/bin/php ./addNodes/obtainNodesInfo.php” -a “sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt” -l “/var/log/hmc/hmc.txn.3.log”
    [2012:07:31 03:33:45][INFO][HMCTxnUtils][HMCTxnUtils.php:106][execBackgroundProcess]: Output from process, command=/usr/bin/php ./addNodes/obtainNodesInfo.php, txnId=3, output=Executing /usr/bin/php ./addNodes/obtainNodesInfo.php sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt > /var/log/hmc/hmc.txn.3.log 2>&1

    I can tell that the host name is being pulled from some database and it is not returning the correct value. Is there a way to see what’s stored in the DB? I’m starting to think that I have a typical environment setup causing this issue but not sure what it is.
    Not sure what ctr + f “(err)” means. I think you are referring to find the error message in the logs which is pasted above.

    Also, how Can I deploy a 2 node cluster to start with? The instructions I have is to setup a HMC server first and then proceed with adding nodes. Thanks for your response in advance.

    regards
    Sanjeev

    Collapse
    #7881

    Also it appears the Monitoring pages were just being blocked by my firewall. I can access them just fine now.

    Collapse
    #7872

    Sanjeev,

    You are using Amazon EC2?
    Maybe try deploying a 2 node cluster before attempting to add a node?
    Maybe try using a different ami?
    hostname -f should not return localhost.localdomain but the fqdn shown on your EC2 console.
    i.e. the format of the lines in /etc/host should be ( hostname -i hostname -f name )
    when you say static ip’s do you mean amazon’s elastic ips? I think i tried this 2 times with similar issues.

    It looks like the fqdn you pass in the host.txt file is not being resolved to the node you are trying to add?

    what does ctr + f “(err)” show you?

    Hope this helps in some way,
    Miguel

    Collapse
    #7846

    Sanjeev
    Participant

    Hi Miguel,

    Thanks for your response. Below is how my /etc/hosts file looks like below on both machines:

    127.0.0.1 localhost.localdomain localhost
    ::1 localhost6.localdomain6 localhost6
    9.30.192.102 cisvltsvm-02.svl.ibm.com cisvltsvm-02 # This is my HMC
    9.30.192.100 cisvltsvm-01.svl.ibm.com cisvltsvm-01 # This is the node I’m trying to add

    Now, since both the IP’s are static, I shouldn’t care about the hosts file but I added entries just in case
    My hostdetai.txt file contains following:

    cisvltsvm-01.svl.ibm.com

    When I’m trying to add a new node, the node discovery is failing and I can see why. Because the node name is incorrect. I don’t know from where it is picking up a junk node name i.e. ÿþc

    I’m certain that I have prepared my machine precisely upto the specifications on the website. I’m clueless on above error as I’m not finding any results on it in my search. Please advice.

    regards
    Sanjeev

    Collapse
    #7837

    Sanjeev,

    I saw you were trying to add a node, so I gave it a try. And was successful, here is what worked for me:

    this is what my /etc/hosts file looks like on all my nodes

    127.0.0.1 localhost localhost.localdomain

    10.206.30.48 domU-12-31-39-14-1D-C2.compute-1.internal Master
    10.190.191.106 ip-10-190-191-106.ec2.internal Worker
    10.206.30.48 domU-12-31-39-14-1D-C2.compute-1.internal Worker2

    10.140.16.66 ip-10-140-16-66.ec2.internal Worker4 # the new node I added

    Here is what my addnodes file lookst like:

    ip-10-140-16-66.ec2.internal # you only put the fqdn of the nodes you want to add

    I passed in one of my amazon private keys blue.pem which grants access to all my nodes, but I imagine that it will also work with the root private key that you generated with ssh-keygen on your hmc deployment node.

    As you mentioned earlier, the HMC cluster management page does not update the cluster to reflect the added node. However everywhere else in the GUI, including ganglia, you can see there is 1 more worker node ( data / task / slave ).

    Also, I did notice that none of the normal monitoring pages ( namenode, jobtracker, hbase etc ) which can normally be reached through: hostname:port ex localhost:50070 for namenode.
    ec2 ex http://ec2-50-17-141-9.compute-1.amazonaws.com:50070 are not working. And display an Unable to connect error message.

    Hope this helps,
    Miguel

    Collapse
    #7792

    Sanjeev
    Participant

    Hi Sasha,

    Thanks for your response. Answers to your questions below:
    1) Uninstalled hmc & puppet from nodes.
    2) Yes. The host file for HMC upload
    3) Yes. I re-installed HMC again. Detail’s below:

    1) Re-configured HMC and one Node machine using static IP/host name. Verified that the machine resolves properly. Password-less login is still good on both.
    2) Now, I’m trying to add just one node to my cluster (just to keep things simple)
    3) At the “Node Discovery and Preparation” step I see Finding reachable nodes: 1 / 2 in progress, 1 succeeded. This means that the cluster is just able to discover ONLY itself.

    In hmc.log file. The error message is as below:

    [2012:07:31 03:33:44][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:261][]: Got error while getting hostInfo for ÿþc[2012:07:31 03:33:45][INFO][HMCTxnUtils][HMCTxnUtils.php:82][execBackgroundProcess]: Trying to background a new process, cluster=sanju, txnId=3, command=/usr/bin/php ./addNodes/obtainNodesInfo.php, args=sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt, logFile=/var/log/hmc/hmc.txn.3.log, execCommand=/usr/bin/php /usr/share/hmc//php/util/BackgroundExecutor.php -t “3” -c “/usr/bin/php ./addNodes/obtainNodesInfo.php” -a “sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt” -l “/var/log/hmc/hmc.txn.3.log”
    [2012:07:31 03:33:45][INFO][HMCTxnUtils][HMCTxnUtils.php:106][execBackgroundProcess]: Output from process, command=/usr/bin/php ./addNodes/obtainNodesInfo.php, txnId=3, output=Executing /usr/bin/php ./addNodes/obtainNodesInfo.php sanju root 3 101 4 /var/run/hmc/clusters/sanju/hosts.txt > /var/log/hmc/hmc.txn.3.log 2>&1

    HMC transaction log says:

    PHP Notice: Undefined index: cluster_name in /usr/share/hmc/php/db/HMCDBAccessor.php on line 1152

    Please advice !!!
    Thanks

    Collapse
    #7785

    Sasha J
    Moderator

    1) Not exactly…
    During HMC install, it also installs puppet as a dependency, and configure it as a puppet master. You do not want to have multiple puppet masters in your cluster.

    3) in you host file which you provide to HMC, right?
    4) this means that something went wrong during installation…

    Try to rerun this, and please, remove hmc and puppet from all nodes except your main one.

    Thank you!
    Sasha

    Collapse
    #7784

    Sasha J
    Moderator

    1) Not exactly…
    During HMC install, it also installs puppet as a dependency, and configure it as a puppet master. You do not want to have multiple puppet masters in your cluster.

    3) in you host file which you provide to HMC, right?
    4) this means that something went wrong during installation…

    Try to rerun this, and please, remove hmc and puppet from all nodes except your main one.

    Thank you!
    Sasha

    Collapse
    #7782

    Sanjeev
    Participant

    @ Sasha: Thanks for prompt response

    1) Regarding installing hmc on nodes: http://hortonworks.com/download/thankyou_hdp1a/ – It says following:

    Starting HMC and installing HDP from HMC:

    Lastly, start up HMC and install the rest of HDP. This only needs to be completed on one machine, that will act as the hmc master node.
    ==> It appears that I interpreted the steps correctly first time and got them wrong next time. Re-reading the steps again made me think that I need to install hmc on all nodes and just start the hmc service on one that will be acting as master. I guess I worn out myself or perhaps the instructions can be more explicit.

    Anyhow, my assumption is that as long as I’m not starting hmc services on my nodes, it should not cause any harm. correct?

    2) Based on your explanation, I’m good with the password-less authentication.
    3) I have FQDN’s in my host files
    4) NameNode UI and JobTracker UI just shows HMC node.

    Collapse
    #7780

    Sasha J
    Moderator

    Sanjeev,
    First, you should not install HMC on more than one node.
    for ssh keys it does not really matter which key to use. the only what is mandatory, that nodes can access each other without password.

    you hosts file should have FQDN in it.

    After adding new nodes, your NameNOde UI and JpobTracker UI should show you newly added nodes.
    Let me do some research on new nodes visibility through HMC.

    Thank you!
    Sasha

    Collapse
    #7778

    Sanjeev
    Participant

    @ Sasha: Thanks for your reply. I’m able to get past these errors. Two things I changed mainly with the new install are as below:

    1. Installed hmc on nodes as well. (missed out earlier)
    2. Prepared the password-less authentication using DSA instead of RSA based on a KB article on your website. Does this really matter?

    To answer your questions above:
    1) I’m resolving hosts using /etc/hosts on all nodes. My nodes resolves fine from each other.
    2. hostname -f on any node returns correct hostname to me. for example: myhost.mydomain.com
    3) The host file that I upload to HMC looks something like below (using FQDN and not including the HMC host):
    hostname1
    hostname2
    hostname3

    Now, I’m seeing a new issue, even though the nodes got added successfully (as far as the wizard claims), I can’t see them on the HMC. Sounds familiar?

    Collapse
    #7615

    Sasha J
    Moderator

    @sanjeev

    instructions for submitting logs are on a sticky in this forum.

    how did you ensure your host resolves?

    can you post the output of :

    hostname -f

    and your hosts file that you upload to hmc?

    thanks in advance,

    Sasha

    Collapse
    #7549

    Sanjeev
    Participant

    Got the same error as mentioned in thread #9596 above. Used the above “NOT” recommended WA to get rid of duplicate node issue, however the main issue still persists. Verified that the host name resolves correctly on the master and node being added (using /etc/hosts file to resolve locally). Any clues?? Please let me know how to submit logs.

    Collapse
    #6630

    Sasha J
    Moderator

    Again,
    not much details to tell anything reasonable…
    Please, look at the hosts list. Based on the error message, it seems like you gave incorrect host IPs passed to HMC…

    Thank you!
    Sasha

    Collapse
    #6627

    Hi Sasha,

    thanks for your post.

    While I was trying to add the new node it got failed. (not sure what is the reason may be a key issue) I tried to reinstall I was getting the following error.
    “Some hosts in the given file are already being used in cluster” and failed to do a reinstallation.
    Did you faced such issue before? If so what is the resolution?

    Thanks,
    Binish

    Collapse
    #6604

    Sasha J
    Moderator

    Just to make this clear:
    when you add new nodes to the cluster, you have to:
    1. Use SAME security key as you use before,
    2. Provide only list of new hosts, not the whole list of all cluster nodes.

    I just run this exercise in Amazon (build 2 nodes cluster, then add 2 more nodes into it).

    Works as expected with no problems.

    Thank you!
    Sasha

    Collapse
    #6603

    Sasha J
    Moderator

    Binish,
    there is not enough details, need to see more logs.
    However, it seems like your connectivity from HMC node to the new nodes does not work as needed (wrong key?). This also may be related to name resolution. Are you sure all nodes can resolve all names correctly?

    Please, do NOT do any changes in HMC code, this may lead to completely unpredictable results and unresolvable problems!

    Thank you!
    Sasha

    Collapse
    #6597

    Found workaround

    vim /usr/share/hmc/php/frontend/addNodes.php
    Just before the duplicate entry comparison updated as follows
    $numDupHosts = 0;

    Collapse
Viewing 20 replies - 1 through 20 (of 20 total)