Home Forums HDP on Linux – Installation Problem with HMC on addnodes step

This topic contains 9 replies, has 3 voices, and was last updated by  Sasha J 1 year, 8 months ago.

  • Creator
    Topic
  • #12311

    PD Jain
    Member

    I have been setting up two nodes cluster. I have tested ssh rsa key for localhost ( which is admin node) and another node. So password less is working from server node to cluster node. Add nodes step is not giving error for another node however it gives error for server node ( for localhost) only. It says ” Failed. Reason: Permission denied, please try again.
    Received disconnect from 127.0.0.1: 2: Too many authentication failures for root” I have run check.sh and uploaded to ftp site.
    ftp> put hmc-out.txt
    local: hmc-out.txt remote: hmc-out.txt
    227 Entering Passive Mode (67,208,64,240,237,223).
    150 Opening BINARY mode data connection for hmc-out.txt
    226 Transfer complete

    can we get to know how it tries to check ssh key for localhost? So we can also try that command locally?

Viewing 9 replies - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #12541

    Sasha J
    Moderator

    PD,
    erros you found in puppet apply log clearly pointing to misconfiguration on the puppet side (or certificates problems).

    Wed Dec 05 11:51:43 +0530 2012 Puppet (err): Could not send report: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed. This is often because the time is out of sync on the server or client

    Did you check time synchronization?
    Do you have any other puppet master in your infrastructure and does your machines trying to connect to it?

    there are a whole lot of the potential problems on the puppet side, so this is why I suggesting clean OS reinstall.

    Consult with puppet web site on debugging puppet communication problems.

    Usually, if you have clean OS installation and follow set by step instructions in http://hortonworks.com/hdp11-hmc-quick-start-guide/ everything works fine.

    Thank you!
    Sasha.

    Collapse
    #12537

    PD Jain
    Member

    Hi

    I am ok to give another try with clean machine however that would take some time again. So I was expecting some debug steps before we come to conclusion of cleanup the VMs.
    I can see that there is one process of puppet started when cluster setup is in progress as
    /usr/bin/ruby /usr/bin/puppet agent –verbose –confdir=/etc/puppet/agent –listen –runinterval 5 –server –report –no-client –waitforcert 10 –configtimeout 600 –debug –logdest=/var/log/puppet_agent.log –httplog /var/log/puppet_agent_http.log –autoflush –use_cached_catalog

    So when i checked /var/log/puppet_agent.log ; below messages multiple times ( looks like multiple tries are given by puppet process)
    Wed Dec 05 11:51:43 +0530 2012 Puppet (err): Could not retrieve catalog; skipping run
    Wed Dec 05 11:51:43 +0530 2012 Puppet (debug): Value of ‘preferred_serialization_format’ (pson) is invalid for report, using default (b64_zlib_yaml)
    Wed Dec 05 11:51:43 +0530 2012 Puppet (debug): report supports formats: b64_zlib_yaml marshal raw yaml; using b64_zlib_yaml
    Wed Dec 05 11:51:43 +0530 2012 Puppet (err): Could not send report: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed. This is often because the time is out of sync on the server or client

    So can we say that something related to puppet SSL certificate related or sync related problem is causing issue here. Can we solve this instead of going for cleanup option?

    Collapse
    #12516

    Sasha J
    Moderator

    PD,
    I suggest you to reinstall your machines, to get clean OS installed and then use step by step instructions as it outlined here:

    http://hortonworks.com/hdp11-hmc-quick-start-guide

    Your deploy logs uploaded does not make any sense, as it it not contains any useful information.

    Please, reinstall your machines and start over.

    Thank you!
    Sasha

    Collapse
    #12499

    PD Jain
    Member

    Thanks for reply.
    I tried steps given by you and started new installation.
    But it failed at cluster setup stage. There is nothing much in deploy log other than saying puppet kick failed. I think it looks like puppet sync is timeout. Uploaded log file on ftp as deplylogs2.txt.
    How we can avoid puppet sync timeout ; is there any such parameter to be set for longer duration? Can we test puppet sync or kick command manually to see if we can reproduce the issue.

    Collapse
    #12480

    Sasha J
    Moderator

    PD,
    what do you mean by “couldn;t understand how to link those manual steps with UI based installation”?
    Just execute commands below:

    yum erase hmc puppet -y
    yum install hmc -y
    yum install -y hadoop hadoop-libhdfs.x86_64 hadoop-native.x86_64 hadoop-pipes.x86_64 hadoop-sbin.x86_64 hadoop-lzo hadoop hadoop-libhdfs.x86_64 hadoop-native.x86_64 hadoop-pipes.x86_64 hadoop-sbin.x86_64 hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hadoop hadoop-libhdfs.x86_64 hadoop-native.x86_64 hadoop-pipes.x86_64 hadoop-sbin.x86_64 hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo zookeeper zookeeper hbase hbase hbase mysql-server hive mysql-connector-java hive hcatalog oozie.noarch extjs-2.2-1 oozie-client.noarch pig.noarch sqoop mysql-connector-java templeton templeton-tar-pig-0.0.1.14-1 templeton-tar-hive-0.0.1.14-1 templeton hdp_mon_dashboard hdp_mon_nagios_addons nagios-3.2.3 nagios-plugins-1.4.9 fping net-snmp-utils ganglia-gmetad-3.2.0 ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons snappy snappy-devel lzo lzo lzo-devel lzo-devel
    service hmc start

    Then connect to HMC UI and perform the installation.

    Thank you!
    Sasha

    Collapse
    #12476

    PD Jain
    Member

    hi,
    I referrred post given by you however i couldn;t understand how to link those manual steps with UI based installation. So I thougth of giving another try with only hadoop services to be installed ( to avoid timeout issue), but somehow that also failed. And even I tried to cleanup, that is failed at one of stage saying that “Miscellaneous clean up failed”.
    I have ftp both install and uninstall log on ftp ( deplylogs1.txt and uninstall1-logs). Now if I tried to again install, it says that uninstall is incomplete. So what are manual steps to be tried out if uninstallation from UI is not done? if we could get cleanup steps for master node and all cluster nodes; then that will help us really.

    Also there is one link which points to hortonworks troubleshooting guide however we are not able to open the link.
    docs.hortonworks.com/CURRENT/index.htm#Deploying_Hortonworks_Data_Platform/Using_HMC/Troubleshooting/Troubleshooting_HMC_Deployments.htm

    Thanks !

    Collapse
    #12449

    Sasha J
    Moderator

    PD,

    Most likely you hitting timeouts during the installation…
    Error in your deploy log is:
    “\”Thu Nov 29 18:46:15 +0530 2012 /Stage[1]/Hdp::Pre_install_pkgs/Hdp::Exec[yum install $pre_installed_pkgs]/Exec[yum install $pre_installed_pkgs]/returns (err): change from notrun to 0 failed: yum install -y hadoop hadoop-libhdfs.x86_64 hadoop-native.x86_64 hadoop-pipes.x86_64 hadoop-sbin.x86_64 hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hadoop hadoop-libhdfs.x86_64 hadoop-native.x86_64 hadoop-pipes.x86_64 hadoop-sbin.x86_64 hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo zookeeper hbase hbase mysql-server hive mysql-connector-java hive hcatalog oozie.noarch extjs-2.2-1 oozie-client.noarch pig.noarch sqoop mysql-connector-java hdp_mon_nagios_addons nagios-3.2.3 nagios-plugins-1.4.9 fping net-snmp-utils ganglia-gmetad-3.2.0 ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons snappy snappy-devel returned 1 instead of one of [0] at /etc/puppet/agent/modules/hdp/manifests/init.pp:255\”",

    Please, consult with the followin post to resolve the problem:

    http://hortonworks.com/community/forums/topic/puppet-failed-no-cert/

    Thank you!
    Sasha

    Collapse
    #12435

    PD Jain
    Member

    Thanks for your reply. After doing changes in /etc/hosts , I am able to continue the installation. However it fails in cluster deploy step with some puppet related errors . Deploy logs is loaded on ftp site(deplylogs.txt).

    can you please check that and let me know what is missing?

    Collapse
    #12313

    Seth Lyubich
    Keymaster

    Hi PD,

    Based on the error it appears that you are using 127.0.0.1 IP address for your machine. Can you please reconfigure your server (/etc/hosts) file to use actual IP?

    you host table should have these two records on both nodes. And you need to be able to resolve both hostnames to correct IP.

    10.88.2.151 v-greenplum5.persistent.co.in
    10.88.2.155 v-greenplum4.persistent.co.in

    Hope this helps,

    Thanks,
    Seth

    Collapse
Viewing 9 replies - 1 through 9 (of 9 total)