Home Forums HDP on Linux – Installation Installation failed – Add nodes step

Tagged: 

This topic contains 28 replies, has 5 voices, and was last updated by  zuhair attya 1 year, 8 months ago.

  • Creator
    Topic
  • #13650

    fbvdka
    Member

    I can not finalise the installation when I am trying add nodes.
    Actually, it failed during installation of ruby-libs.
    When I try yum install rubylibs , my system tells me that it requires libreadlines.so.5 even if compat-readline5 is installed !
    My system is RHEL 6.

Viewing 28 replies - 1 through 28 (of 28 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #15582

    zuhair attya
    Member

    I don’t know if you still have the issue, but I found that “Registering” my RHEL server with Redhat support resolved the issues of dependancies.. if that is what you are having. in fact yum will update its repos once registered, so that it downloads all the prerequisites from Redhat public repo’s… if you cannot register your server, try using Centos at least you will get all the repos automatically downloaded.
    Excuse my knowledge in Linux… I am just a beginner.. :(

    Collapse
    #14934

    Larry Liu
    Moderator

    Hi, Fabrice,

    Thanks for getting log for me. Let’s continue working offline.

    Larry

    Collapse
    #14927

    fbvdka
    Member

    I runned ambari-server with -Djavax.net.debug=ssl:handshake
    in /var/log/ambari-server/ambari-server.out :

    qtp1402163909-30, fatal error: 46: General SSLEngine problem
    sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors
    %% Invalidated: [Session-22, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA]
    qtp1402163909-30, SEND TLSv1.2 ALERT: fatal, description = certificate_unknown
    qtp1402163909-30, WRITE: TLSv1.2 Alert, length = 2
    qtp1402163909-30, fatal: engine already closed. Rethrowing javax.net.ssl.SSLHandshakeException: General SSLEngine problem

    Collapse
    #14890

    Larry Liu
    Moderator

    My email: lliu@hortonworks.com
    my phone: 408-645-7043

    Collapse
    #14824

    fbvdka
    Member

    Hi Larry,

    Could you give me your email in order to send you my phone number ?

    Collapse
    #14325

    Larry Liu
    Moderator

    Hi, Fabrice,

    Can we talk about the issue over the phone? What is your phone number?

    Larry

    Collapse
    #14315

    fbvdka
    Member

    The openssl version I have on my master is OpenSSL 1.0.0-fips 29 Mar 2010

    Collapse
    #14314

    fbvdka
    Member

    Hi,
    After a reinstallation, and removed all keys on nodes and master, I have now the following message in node /var/log/ambari-agent/ambari-agent.log

    INFO 2013-01-24 11:29:25,656 Controller.py:103 - Unable to connect to: https://p-h1tfc-serv1.xxx.xx:84$
    Traceback (most recent call last):
    File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 88, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 235, in sendRequest
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 76, in __init__
    self.connect()
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 81, in connect
    self.httpsconn.connect()
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 65, in connect
    ca_certs=server_crt)
    File "/usr/lib64/python2.6/ssl.py", line 338, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
    File "/usr/lib64/python2.6/ssl.py", line 120, in __init__
    self.do_handshake()
    File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
    self._sslobj.do_handshake()
    SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol

    Always a problem with ssl

    I tested openssl command from node in debug mode and I have the following message :

    SSL_connect:SSLv3 write client key exchange A
    write to 0x28451b0 [0x28572c0] (139 bytes => -1 (0xFFFFFFFFFFFFFFFF))
    SSL_connect:error in SSLv3 write change cipher spec A
    SSL_connect:error in SSLv3 write change cipher spec A
    write:errno=32

    Collapse
    #14203

    Larry Liu
    Moderator

    Hi, Fabrice

    Yes, you can.

    Here is the command:

    “openssl req -new -newkey rsa:1024 -nodes -keyout %(keysdir)s/%(hostname)s.key\
    -subj /OU=%(hostname)s/\
    -out %(keysdir)s/%(hostname)s.csr”

    I recommend you to install with a clean system.

    Larry

    Collapse
    #14200

    fbvdka
    Member

    Do you know if is it possible to create manually certificates (on master /var/lib/ambari-server/keys and on nodes /var/lib/ambari-agent/keys) with openssl and test with openssl ?

    Collapse
    #14199

    fbvdka
    Member

    Hi Larry,

    Yes I did. I am pretty sure it is due to certificate or ssl problems.
    Actually I switched from the 0.9 version. I have cleaned before my system.

    Collapse
    #14080

    Larry Liu
    Moderator

    Hi, Fabrice

    Did you have a clean system when you switch from 1.1 to 1.2? If not, this might be the issue. Please have a clean OS installation before start HDP 1.2 installation.

    Thanks

    Larry

    Collapse
    #14056

    fbvdka
    Member

    Hi Larry,

    Identical results for hostname and hostname -f.
    From my point of view, I believe there is a problem with certificate and openssl.

    Collapse
    #13937

    Larry Liu
    Moderator

    Hi Fabrice,

    Can you please run the following command on each host and make sure their results are identical?

    hostname
    hostname -f

    Thanks

    Larry

    Collapse
    #13909

    fbvdka
    Member

    I have done the following openssl command from one node (called serv2) :
    openssl s_client -connect serv1:8441 -cert serv2.crt -key serv2.key -CAfile ca.crt

    I received the following results :

    CONNECTED(00000003)
    depth=0 C = XX, L = Default City, O = Default Company Ltd
    verify return:1
    140209702324040:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:184:
    ---
    Certificate chain
    0 s:/C=XX/L=Default City/O=Default Company Ltd
    i:/C=XX/L=Default City/O=Default Company Ltd
    ---
    Server certificate
    -----BEGIN CERTIFICATE-----
    ..........
    -----END CERTIFICATE-----
    subject=/C=XX/L=Default City/O=Default Company Ltd
    issuer=/C=XX/L=Default City/O=Default Company Ltd
    ---
    Acceptable client certificate CA names
    /C=XX/L=Default City/O=Default Company Ltd
    ---
    SSL handshake has read 2276 bytes and written 2465 bytes
    ---
    New, TLSv1/SSLv3, Cipher is EDH-RSA-DES-CBC3-SHA
    Server public key is 4096 bit
    Secure Renegotiation IS supported
    Compression: NONE
    Expansion: NONE
    SSL-Session:
    Protocol : TLSv1
    Cipher : EDH-RSA-DES-CBC3-SHA
    Session-ID: 50FD175107A7A2B66E6E7D923BC2D919DB10EC03E92B529FE40226FC6B628E3B
    Session-ID-ctx:
    Master-Key: xxxx
    Key-Arg : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 1358763857
    Timeout : 300 (sec)
    Verify return code: 0 (ok)
    ---

    Collapse
    #13908

    fbvdka
    Member

    Hi Larry and tedr,

    I am using DNS. I can sucessfully ssh to other nodes from the master, and ssh to itself too, obvioussly passwordless.

    Collapse
    #13799

    tedr
    Member

    Hi Fabrice,

    Can you successfully ssh from the node on which Ambari is being run to all other nodes in the cluster? Also check that you can ssh from the Ambari machine to itself. All of this ssh’ing must be passwordless.

    Thanks,
    Ted.

    Collapse
    #13797

    Larry Liu
    Moderator

    Hi, Fabrice

    Thanks for providing the test information.

    Are you using /etc/hosts or DNS? If you use /etc/hosts, can you please check if all the hosts in the cluster are in the /etc/hosts on each node?

    Hope this helps.

    Larry

    Collapse
    #13784

    fbvdka
    Member

    Hi Larry,

    I am trying to install one master and three nodes.

    I runned the command netstat and I had the following results :


    tcp 0 0 0.0.0.0:8441 0.0.0.0:* LISTEN 10434/java
    tcp 0 0 10.192.135.146:8441 10.192.135.148:53701 TIME_WAIT -
    tcp 0 0 10.192.135.146:8441 10.192.135.149:62078 FIN_WAIT2 -
    tcp 0 0 10.192.135.146:8441 10.192.135.148:53703 FIN_WAIT2 -

    In the /var/log/ambari-server/ambari-server.log there is a warning :

    09:09:24,332 WARN nio:651 - javax.net.ssl.SSLHandshakeException: General SSLEngine problem

    And in one node /var/log/ambari-agent/ambari-agent.log

    INFO 2013-01-17 16:53:28,979 security.py:48 - SSL Connect being called.. connecting to the server
    INFO 2013-01-17 16:53:29,093 Controller.py:103 - Unable to connect to: https://@ipmaster:8441/agent/v1/register/@ipnode
    Traceback (most recent call last):
    File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 88, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 235, in sendRequest
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 76, in __init__
    self.connect()
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 81, in connect
    self.httpsconn.connect()
    File "/usr/lib/python2.6/site-packages/ambari_agent/security.py", line 65, in connect
    ca_certs=server_crt)
    File "/usr/lib64/python2.6/ssl.py", line 338, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
    File "/usr/lib64/python2.6/ssl.py", line 120, in __init__
    self.do_handshake()
    File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
    self._sslobj.do_handshake()
    SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol

    Collapse
    #13731

    Larry Liu
    Moderator

    Hi, Fabrice

    From your log file, I see the following error: Unable to connect to: https://xxxxx:8441/agent/v1/register/xxxx

    Can you please also check if your ambari master server is able to connect to https://xxxxx:8441/agent/v1/register/xxxx? Please run the following command on ambari master:
    netstat -anp| grep 8441

    If there is no record returned from above command, can you please attach the log files from the following directory?

    /var/log/ambari-server
    /var/log/ambari-agent

    Thanks

    Larry

    Collapse
    #13718

    Larry Liu
    Moderator

    Hi Fabrice,

    Can you please provide the following information?

    1. How many nodes in your cluster?
    2. Do you have ssh passwordless set up?
    3. What is in the /etc/hosts on each node?
    4. What steps you have performed?

    Thanks

    Larry

    Collapse
    #13714

    Larry Liu
    Moderator

    HI Fabrice,

    Thanks for trying Ambari 1.2. I am looking into the issue you are experiencing.

    Larry

    Collapse
    #13713

    fbvdka
    Member

    I have reinstalled with ambari 1.2
    Now I have the following error because the registration failed :
    (‘INFO 2013-01-17 14:57:12,147 security.py:48 – SSL Connect being called.. connecting to the server
    INFO 2013-01-17 14:57:12,256 Controller.py:103 – Unable to connect to: https://xxxxx:8441/agent/v1/register/xxxx
    Traceback (most recent call last):
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 88, in registerWithServer
    response = self.sendRequest(self.registerUrl, data)
    File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 235, in sendRequest
    self.cachedconnect = security.CachedHTTPSConnection(self.config)
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 76, in __init__
    self.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 81, in connect
    self.httpsconn.connect()
    File “/usr/lib/python2.6/site-packages/ambari_agent/security.py”, line 65, in connect
    ca_certs=server_crt)
    File “/usr/lib64/python2.6/ssl.py”, line 338, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
    File “/usr/lib64/python2.6/ssl.py”, line 120, in __init__
    self.do_handshake()
    File “/usr/lib64/python2.6/ssl.py”, line 279, in do_handshake
    self._sslobj.do_handshake()
    SSLError: [Errno 8] _ssl.c:490: EOF occurred in violation of protocol
    ‘, None)

    STDERR
    Connection to xxxxx closed.
    Registering with the server…
    Registration with the server failed.

    Collapse
    #13712

    fbvdka
    Member

    Hi Ted,

    Yes I have.

    Collapse
    #13690

    tedr
    Member

    HI Fabrice,

    Thanks for trying HDP.

    Have you made sure that the firewall (iptables) is off and SELinux is disabled on all nodes on your cluster?

    Thanks!
    Ted.

    Collapse
    #13689

    fbvdka
    Member

    Finally I have succeeded in installing ruby-libs. I think it was a problem of versionning.
    But now, when I am trying to configure my cluster, I received a message “Async call failed” at the step “Add Nodes’

    Collapse
    #13686

    fbvdka
    Member

    We did not installed a version of ruby-libs before.
    Here the message I received :
    Preparing discovered nodes
    Entry Id : 103
    Final result : TOTALFAILURE
    Progress at the end : : 1 / 4 in progress; 3 failed
    Additional information :
    Host Info
    xxxx
    Failed. Reason: Error: Package: ruby-libs-1.8.7.352-7.el6_2.i686 (G02R01C00)
    Requires: libreadline.so.5
    xxxx:_ERROR_:retcode:[1], CMD:[out=`yum install -y ruby-devel rubygems`]: OUT:[Loaded plugins: priorities, security
    35 packages excluded due to repository priority protections
    Setting up Install Process
    No package rubygems available.
    Resolving Dependencies
    --> Running transaction check
    ---> Package ruby-devel.i686 0:1.8.7.352-7.el6_2 will be installed
    --> Processing Dependency: libruby.so.1.8 for package: ruby-devel-1.8.7.352-7.el6_2.i686
    ---> Package ruby-devel.x86_64 0:1.8.7.352-7.el6_2 will be installed
    --> Running transaction check
    ---> Package ruby-libs.i686 0:1.8.7.352-7.el6_2 will be installed
    --> Processing Dependency: libgdbm.so.2 for package: ruby-libs-1.8.7.352-7.el6_2.i686
    --> Processing Dependency: libreadline.so.5 for package: ruby-libs-1.8.7.352-7.el6_2.i686
    --> Running transaction check
    ---> Package gdbm.i686 0:1.8.0-36.el6 will be installed
    ---> Package ruby-libs.i686 0:1.8.7.352-7.el6_2 will be installed
    --> Processing Dependency: libreadline.so.5 for package: ruby-libs-1.8.7.352-7.el6_2.i686
    --> Finished Dependency Resolution
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest]

    Collapse
    #13651

    Sasha J
    Moderator

    Fabrice,
    this may be a versioning problem…
    HMC trying to install exact version of ruby-libs, and if you have ti installed but on different version, it may fail.
    Please, remove your installed package and rerun node addition. This should pull and install expected version.

    Thank you!
    Sasha

    Collapse
Viewing 28 replies - 1 through 28 (of 28 total)