Home Forums HDP on Linux – Installation Ambari Installation Failed

This topic contains 10 replies, has 5 voices, and was last updated by  tedr 1 year, 3 months ago.

  • Creator
    Topic
  • #19886

    Santosh R
    Member

    Hi ,
    I am trying to install Ambari over a 3 node cluster. After 1st Logging in to Ambari I mentioned the list of Hostnames for the 3 nodes & proceeding further the Registration of these Hosts FAILS for the 2 Slave nodes.
    When i check the Error Log it appears that certain features are Unavailable.
    Following is the Log for one of the Slaves :

    STDOUT

    STDERR
    STDOUT

    STDERR
    STDOUT
    Verifying Python version compatibility…
    Using python /usr/bin/python2.6
    Checking for previously running Ambari Agent…
    Starting ambari-agent
    Verifying ambari-agent process status…
    Ambari Agent successfully started
    Agent PID at: /var/run/ambari-agent/ambari-agent.pid
    Agent log at: /var/log/ambari-agent/ambari-agent.out
    (‘hostname: ok slave2.tpbidw.com
    ip: ok 10.6.120.206
    cpu: ok Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
    Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
    memory: ok 3.76519 GB
    disks: ok
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vg_slave2-lv_root
    50G 5.9G 41G 13% /
    tmpfs 1.9G 260K 1.9G 1% /dev/shm
    /dev/sda1 485M 38M 423M 9% /boot
    /dev/mapper/vg_slave2-lv_home
    241G 189M 228G 1% /home
    os: ok CentOS release 6.3 (Final)
    iptables: ok
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination

    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination

    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination
    selinux: ok SELINUX=disabled
    SELINUXTYPE=targeted
    yum: ok yum-3.2.29-30.el6.centos.noarch
    rpm: ok rpm-4.8.0-27.el6.x86_64
    openssl: ok openssl-1.0.0-20.el6_2.5.x86_64
    curl: ok curl-7.19.7-26.el6_2.4.x86_64
    wget: ok wget-1.12-1.4.el6.x86_64
    net-snmp: ok net-snmp-5.5-44.el6.x86_64
    net-snmp-utils: ok net-snmp-utils-5.5-44.el6.x86_64
    ntpd: UNAVAILABLE
    ruby: UNAVAILABLE
    puppet: UNAVAILABLE
    nagios: UNAVAILABLE
    ganglia: UNAVAILABLE
    passenger: UNAVAILABLE
    hadoop: ok hadoop-1.1.2.21-1.el6.x86_64
    yum_repos: ok
    HDP-1.2.0 HDP 53
    HDP-UTILS-1.1.0.15 Hortonworks Data Platform Utils Version – HDP-UTILS-1. 52
    HDP-epel HDP-epel 8,555
    zypper_repos: UNAVAILABLE
    ‘, None)
    (‘INFO 2013-04-02 16:19:07,281 security.py:49 – SSL Connect being called.. connecting to the server
    INFO 2013-04-02 16:19:07,352 Controller.py:103 – Unable to connect to: https://master.tpbidw.com:8441/agent/v1/register/slave2.tpbidw.com
    Traceback (most recent call last):
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 88, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 237, in sendRequest
    self.cachedconnect = secur

Viewing 10 replies - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #27108

    tedr
    Moderator

    Hi Gunnar,

    It looks like the only thing you are missing for removal of Ambari is ‘yum erase ambari-agent’ and ‘yum erase ambari-server’, which should be done before any ‘rm’ of the folders. ambari won’t show in the list when you grep for HDP. Unfortunately Ambari uninstallation in not documented as of yet.

    Thanks,
    Ted.

    Collapse
    #27041

    Hi Ted,

    I gave up on this and reimaged the servers. That fixed the issue.

    The topic of unistalling ambari is an interesting one: where is the how-to documented? So far, I do:

    ambari-agent or ambari-server stop
    ambari-server reset
    yum list installed | grep HDP — remove all that
    rm -rf /etc/yum.repos.d/ambari* /etc/yum.repos.d/HDP*
    rm -rf /etc/ambari* /etc/hadoop* /etc/hbase /etc/hive /etc/impala /etc/oozie /etc/pig /etc/sqoop /etc/zookeeper /etc/flume*
    rm -rf /var/log/ambari* /var/log/flume* /var/log/hadoop* /var/log/hbase /var/log/hive /var/log/hue /var/log/oozie /var/log/zookeeper
    rm -rf /var/lib/ambari*
    userdel puppet;userdel ambari-qa;userdel mapred;userdel hdfs;userdel rrdcached;userdel hbase;userdel hive;userdel hcat;userdel oozie;userdel sqoop;userdel zookeeper

    What am I missing for removal?

    I suspect that this might be related to firewalls somehow… I noticed that ip6tables is still enabled so I am testing with turning off iptable, ip6tables, and the firewall itself using the RHEL setup utility.

    Collapse
    #26997

    tedr
    Moderator

    Hi Gunnar,

    Firstly let me apologize for for nor replying sooner. Anyway it looks like you are having some ssl certificate problems. the best course of action would be to uninstall ambari-server and ambari-agent on all nodes, then make sure to remove /var/lib/ambari-*/keys directory, then re-install.

    Thanks,
    Ted.

    Collapse
    #26131

    Part 2:

    (‘INFO 2013-05-22 22:04:38,081 security.py:48 – SSL Connect being called.. connecting to the server
    INFO 2013-05-22 22:04:38,177 Controller.py:103 – Unable to connect to: https://sq010.xxxxxx.xx.com:8441/agent/v1/register/sq032.xxxxxx.xx.com
    Traceback (most recent call last):
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 88, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 235, in sendRequest
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 76, in __init__
    self.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 81, in connect
    self.httpsconn.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 65, in connect
    ca_certs=server_crt)
    File “/usr/lib64

    [root@sq010 .ssh]# ssh sq010.xxxxxx.xx.com ‘ntpstat’
    synchronised to NTP server (xx.xxx.xxx.123) at stratum 4
    time correct to within 189 ms
    polling server every 1024 s
    [root@sq010 .ssh]# ssh sq032.xxxxxx.xx.com ‘ntpstat’
    synchronised to NTP server (xx.xxx.xxx.123) at stratum 4
    time correct to within 170 ms
    polling server every 1024 s
    [root@sq010 .ssh]# ssh sq034.xxxxxx.xx.com ‘ntpstat’
    synchronised to NTP server (xx.xxx.xxx.123) at stratum 4
    time correct to within 198 ms
    polling server every 1024 s
    [root@sq010 .ssh]# ssh sq035.xxxxxx.xx.com ‘ntpstat’
    synchronised to NTP server (xx.xxx.xxx.123) at stratum 4
    time correct to within 169 ms
    polling server every 1024 s
    [root@sq010 .ssh]#

    The ambari-server logs shows:

    06:07:35,744 INFO ClusterControllerImpl:92 – Using resource provider org.apache.ambari.server.controller.internal.HostResourceProvider for request type Host
    06:07:35,821 INFO ClusterControllerImpl:92 – Using resource provider org.apache.ambari.server.controller.internal.HostResourceProvider for request type Host
    06:08:32,159 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:08:55,719 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:08:58,781 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:09:04,839 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:09:32,540 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:10:13,694 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca
    06:10:13,939 WARN nio:651 – javax.net.ssl.SSLException: Received fatal alert: unknown_ca

    tcptrack shows the 8441 in CLOSING state all the time.

    Collapse
    #26130

    This is what I see (part 1):

    STDOUT

    STDERR
    STDOUT

    STDERR
    STDOUT
    Verifying Python version compatibility…
    Using python /usr/bin/python2.6
    Checking for previously running Ambari Agent…
    Starting ambari-agent
    Verifying ambari-agent process status…
    Ambari Agent successfully started
    Agent PID at: /var/run/ambari-agent/ambari-agent.pid
    Agent log at: /var/log/ambari-agent/ambari-agent.out
    (‘hostname: ok sq032.xxxxxx.xx.com
    ip: ok xx.xxx.xx43
    cpu: ok Quad-Core AMD Opteron(tm) Processor 8381 HE
    Quad-Core AMD Opteron(tm) Processor 8381 HE
    Quad-Core AMD Opteron(tm) Processor 8381 HE
    Quad-Core AMD Opteron(tm) Processor 8381 HE
    memory: ok 15.5796 GB
    disks: ok
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda3 118G 17G 96G 15% /
    tmpfs 7.8G 0 7.8G 0% /dev/shm
    /dev/sda1 504M 62M 417M 13% /boot
    os: ok *****************************************************************
    iptables: ok
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination

    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination

    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
    pkts bytes target prot opt in out source destination
    selinux: ok SELINUX=disabled
    SELINUXTYPE=targeted
    yum: ok yum-3.2.29-22.el6.noarch
    rpm: ok rpm-4.8.0-19.el6.x86_64
    openssl: ok openssl-1.0.0-20.el6.x86_64
    openssl-1.0.0-20.el6.i686
    curl: ok curl-7.19.7-26.el6_1.2.x86_64
    wget: ok wget-1.12-1.4.el6.x86_64
    net-snmp: ok net-snmp-5.5-37.el6.x86_64
    net-snmp-utils: ok net-snmp-utils-5.5-37.el6.x86_64
    ntpd: UNAVAILABLE
    ruby: UNAVAILABLE
    puppet: UNAVAILABLE
    nagios: UNAVAILABLE
    ganglia: UNAVAILABLE
    passenger: UNAVAILABLE
    hadoop: UNAVAILABLE
    yum_repos: ok
    AMBARI-1.x Ambari 1.x 5
    HDP-UTILS-1.1.0.15 Hortonworks Data Platform Util 52
    zypper_repos: UNAVAILABLE
    ‘, None)

    Collapse
    #26118

    Hi,

    Was this ever resolved? I’m running into the same issue.

    Thanks,

    Gunnar

    Collapse
    #20137

    tedr
    Member

    Hi Santosh,

    It looks like the main thing that is missing is NTP, can you make sure that NTP is running on all of the nodes, this is one of the prerequisites. On each of the nodes run this command to check: “service ntpd status” The time on all nodes must be in sync for SSL to work properly.

    Thanks,
    Ted.

    Collapse
    #20010

    Santosh R
    Member

    Hello Akki ,
    Thanks for your immediate reply. The password-less ssh is running just fine with the master being able to access both slaves without any password. The log seems to show features like ntpd,puppet,ganglia, ruby,nagios to be unavailable in the slaves and at the end of the log the following is found :

    ” SSLError: [Errno 1] _ssl.c:490: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
    ‘, None)
    STDERR
    Connection to slave2.tpbidw.com closed.
    Registering with the server…
    Registration with the server failed. ”

    I checked out all the pre-requisites & settings and they’re just fine. Could you please make something out of the above logs as to what is probably missing?

    Collapse
    #19892

    Akki Sharma
    Moderator

    Hello Santosh,

    It seems your password-less ssh is not working between the nodes. Please perform all the steps in the following page to prepare the environment:

    http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_using_Ambari_book/content/ambari-chap1-5.html

    Best Regards,
    Akki

    Collapse
    #19887

    Santosh R
    Member

    Please follow the remaining part of the Log for Slave node :
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 77, in __init__
    self.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 82, in connect
    self.httpsconn.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 66, in connect
    ca_certs=server_crt)
    File “/usr/lib64/python2.6/ssl.py”, line 338, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
    File “/usr/lib64/python2.6/ssl.py”, line 120, in __init__
    self.do_handshake()
    File “/usr/lib64/python2.6/ssl.py”, line 279, in do_handshake
    self._sslobj.do_handshake()
    SSLError: [Errno 1] _ssl.c:490: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
    ‘, None)

    STDERR
    Connection to slave2.tpbidw.com closed.
    Registering with the server…
    Registration with the server failed.

    Collapse
Viewing 10 replies - 1 through 10 (of 10 total)