Home Forums HDP on Linux – Installation Ambari fails to register slaves

This topic contains 8 replies, has 4 voices, and was last updated by  Prateek Goel 2 months, 3 weeks ago.

  • Creator
    Topic
  • #29983

    Rupert Bailey
    Participant

    Hello team; I failed to register the slaves using the ambari web interface.

    The error against the failed nodes are (truncated for presentation):
    STDOUT Cluster primary OS type is redhat6 and local OS type is centos6
    STDERR Connection to 192.168.1.21 closed.
    STDOUT sudo-1.8.6p3-7.el6.x86_64
    STDERR Connection to 192.168.1.21 closed.
    STDERR Connection to 192.168.1.21 closed.
    STDERR Registering with the server…
    Registration with the server failed.

    Now I’m not sure if this is a psql error, as the below statement does not work
    [root@master ~]# psql -U ambari-server -d ambari
    psql: FATAL: no pg_hba.conf entry for host “[local]“, user “ambari-server”, database “ambari”, SSL off

    But it seems more like a networking issue, and frankly I’m stumbling diagnosing networking errors. I might have a namespace issue for the slaves in relation with the master.
    jekyll 192.168.1.68 (host os)
    master 192.168.1.20 (guest vmware)
    slave1 192.168.1.21 (guest vmware)
    slave2 192.168.1.22 (guest vmware)
    slave3 192.168.1.23 (guest vmware)
    virtual networking: bridged

    Also ambari-agents were successfully installed on the slaves using ambari-server:
    [root@slave3 ~]# ambari-agent status
    Found ambari-agent PID: 3426
    ambari-agent running.
    Agent PID at: /var/run/ambari-agent/ambari-agent.pid
    Agent outout at: /var/log/ambari-agent/ambari-agent.out
    Agent log at: /var/log/ambari-agent/ambari-agent.log

    Does this provide any clues? Which, if any logs, would provide hints? I’m running out of ideas? Virtual network diagnosis is brutal. :(

Viewing 8 replies - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #55364

    Prateek Goel
    Participant

    Hi Jeff Sposetti ,
    I use ambari-1.2.3.7.
    I got solution to my problem actually i didn’t disable SELINUX in master and slave tha’ts why i m getting SSL certification error while registering client’s .

    Collapse
    #55355

    Jeff Sposetti
    Moderator

    Oh, it’s — version (dash-dash, two dashes).

    Collapse
    #55335

    Prateek Goel
    Participant

    i m getting this result after executing command ambari-server –version

    Using python /usr/bin/python2.6
    Usage: /usr/sbin/ambari-server {start|stop|restart|setup|upgrade|status|upgradestack} [options]

    Collapse
    #55129

    Jeff Sposetti
    Moderator

    Can you confirm which version of Ambari you are using?

    ambari-server –version

    Collapse
    #55121

    Prateek Goel
    Participant

    I am getting error while registering slave in ambari master,

    I debug ambari-agent and got this error in ambari-agent.log

    INFO 2014-06-02 14:53:27,818 Controller.py:99 – Unable to connect to: https://aiq-master:8441/agent/v1/register/aiq-master
    Traceback (most recent call last):
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 84, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 233, in sendRequest
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 77, in __init__
    self.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 82, in connect
    self.httpsconn.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 66, in connect
    ca_certs=server_crt)
    File “/usr/lib64/python2.6/ssl.py”, line 342, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
    File “/usr/lib64/python2.6/ssl.py”, line 120, in __init__
    self.do_handshake()
    File “/usr/lib64/python2.6/ssl.py”, line 279, in do_handshake
    self._sslobj.do_handshake()
    SSLError: [Errno 8] _ssl.c:492: EOF occurred in violation of protocol

    Collapse
    #30020

    tedr
    Moderator

    Hi Rupert,

    It is good to hear that you fixed the problem. I really don’t know why, but using ip addresses is flaky at best, our instructions point out that fully qualified domain names (FQDN) should be used. The fully qualified host name is what you get when you run the command ‘hostname -f’. It appears that the reason that ‘slave[x].localdomain’ didn’t work is that that version of the hostname does not exist in your /etc/hosts files. For the FQDN to work the host name in the /etc/sysconfig/network file must have a matching name in /etc/hosts. Usually the entries in /etc/hosts are something like:

    192.168.1.xxx host.domain host

    Thanks,
    Ted.

    Collapse
    #30002

    Rupert Bailey
    Participant

    Thankyou for responding tedr

    I just fixed it by using the hostname not the ipaddress.
    I was using slave[x].localdomain and then 192.168.1.xxx but unqualified got it! (slave[x])

    I’m still curious about how this worked though if you can spare a second. Is the information you need?

    [root@master ~]# ssh slave1 “cat /etc/hosts”
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.1.20 master
    192.168.1.21 slave1
    192.168.1.22 slave2
    192.168.1.23 slave3
    [root@master ~]# ssh slave2 “cat /etc/hosts”
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.1.20 master
    192.168.1.21 slave1
    192.168.1.22 slave2
    192.168.1.23 slave3
    [root@master ~]# ssh slave3 “cat /etc/hosts”
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.1.20 master
    192.168.1.21 slave1
    192.168.1.22 slave2
    192.168.1.23 slave3

    [root@master ~]# ssh slave1 “cat /etc/sysconfig/network;”
    NETWORKING=yes
    HOSTNAME=slave1.localdomain
    [root@master ~]# ssh slave2 “cat /etc/sysconfig/network;”
    NETWORKING=yes
    HOSTNAME=slave2.localdomain
    [root@master ~]# ssh slave3 “cat /etc/sysconfig/network;”
    NETWORKING=yes
    HOSTNAME=slave3.localdomain

    this also demonstates id_rsa key being used to pull this information successfully

    Also I have been able to run this on a slave to test master response:
    [root@slave1 ~]# curl -u admin http://192.168.1.20:8080/api/v1/hosts
    Enter host password for user ‘admin’:
    {
    “href” : “http://192.168.1.20:8080/api/v1/hosts”,
    “items” : [
    {
    "href" : "http://192.168.1.20:8080/api/v1/hosts/slave2",
    "Hosts" : {
    "host_name" : "slave2"
    }
    },
    {
    "href" : "http://192.168.1.20:8080/api/v1/hosts/slave1.localdomain",
    "Hosts" : {
    "host_name" : "slave1.localdomain"
    }
    },
    {
    "href" : "http://192.168.1.20:8080/api/v1/hosts/slave1",
    "Hosts" : {
    "host_name" : "slave1"
    }
    },
    {
    "href" : "http://192.168.1.20:8080/api/v1/hosts/slave3",
    "Hosts" : {
    "host_name" : "slave3"
    }
    }
    ]

    Collapse
    #29998

    tedr
    Moderator

    Hi Rupert,

    What are the contents of the /etc/hosts files and the /etc/sysconfig/network files on each of these slaves? and what did you enter in the list of hosts that you wanted to set the cluster up on? And did you make sure to set up passwordless ssh from the master to all of the other boxes?

    Thanks,
    Ted.

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)