Home Forums HDP on Linux – Installation Cluster uninstallation failed

This topic contains 9 replies, has 4 voices, and was last updated by  tedr 1 year, 9 months ago.

  • Creator
    Topic
  • #11962

    Shyam Kadari
    Member

    Hello:
    I am installing HDP (latest version) using “hmc” on a five node VM nodes. The installation failed becasue of puppet cert problems.
    I deleted the puppet certs on nodes so they can be regenarated.
    hmc required me to uninstall before attempting to install again.
    When I uninstalled, uninstall failed.
    hmc shows following error:

    Cluster uninstallation failed
    Failed to uninstall the cluster. View the troubleshooting guide.
    What should I do to recover from this uninstall error?

    S

    Here is the part of the error file:

    [2012:11:08 22:44:07][WARN][PuppetInvoker][PuppetInvoker.php:344][waitForResults]: Kick timed out, waited 120 seconds
    [2012:11:08 22:44:08][INFO][PuppetInvoker][PuppetInvoker.php:237][createGenKickWaitResponse]: Response of genKickWait:
    Array
    (
    [result] => 0
    [error] =>
    [nokick] => Array
    (
    )

    [failed] => Array
    (
    [0] => logproc-name-1.highwire.org
    [1] => logproc-data-2.highwire.org
    [2] => logproc-data-3.highwire.org
    [3] => logproc-data-1.highwire.org
    [4] => logproc-data-4.highwire.org
    )

    [success] => Array
    (
    )

    [timedoutnodes] => Array
    (
    [0] => logproc-data-2.highwire.org
    [1] => logproc-data-3.highwire.org
    [2] => logproc-data-1.highwire.org
    [3] => logproc-data-4.highwire.org
    )

    )

    [2012:11:08 22:44:08][INFO][Cluster:HighWireDev01][Cluster.php:150][_uninstallAllServices]: Persisting puppet report for uninstall HDP
    [2012:11:08 22:44:08][ERROR][Cluster:HighWireDev01][Cluster.php:164][_uninstallAllServices]: Puppet kick failed, no successful nodes
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 4-60-2:FAILED: Cluster uninstall:FAILED
    [2012:11:08 22:44:08][INFO][Cluster:HighWireDev01][Cluster.php:1039][setState]: HighWireDev01 – FAILED
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 4-61-60:FAILED:Miscellaneous uninstall:FAILED
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: MISCELLANEOUS – FAILED
    [2012:11:08 22:44:08][INFO][Service: MISCELLANEOUS (HighWireDev01)][Service.php:130][setState]: MISCELLANEOUS – FAILED dryRun=
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 4-62-60:FAILED:Nagios uninstall:FAILED
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: NAGIOS – FAILED
    [2012:11:08 22:44:08][INFO][Service: NAGIOS (HighWireDev01)][Service.php:130][setState]: NAGIOS – FAILED dryRun=
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 4-63-60:FAILED:Ganglia uninstall:FAILED
    [2012:11:08 22:44:08][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: GANGLIA – FAILED
    [2012:11:08 22:44:08][INFO][Service: GANGLIA (HighWireDev01)][Service.php:130][setState]: GANGLIA – FAILED

Viewing 9 replies - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #12127

    tedr
    Member

    T Chenna Krishna,

    Have you fully set up passwordless ssh to all nodes on the cluster? Have you copied your ssh key to the other boxes?

    Ted

    Collapse
    #12121

    ya.. I am running cluster as root only.

    Collapse
    #12028

    tedr
    Member

    T Chenna Krishna,

    Ssh-keygen is what you should have used to generate the keys with. Are you running the installer as root or some other user?

    Ted.

    Collapse
    #12027

    Thanks Shyam,
    i changed the file,still i got same error.
    I used following linux command to generate Key
    ssh-keygen
    is this correct or not
    do you please suggest me?

    Collapse
    #11971

    tedr
    Member

    T Chenna Krishna,

    It looks like what went wrong is that you pointed HMC at your public key not the private key that it needs. Please try pointing it at id_rsa not id_rsa.pub.

    Ted.

    Shyam,

    Could you run the script talked about here: http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/ and upload the results?

    Ted.

    Collapse
    #11970

    Shyam Kadari
    Member

    Hello Robert:

    I have uninstalled and installed hmc and puppet multiple times and tried to creat a cluster.
    At different points of installation different puppet agents failed to complete.

    A) All the RPMS are already intalled on all the nodes.
    B) The VM’s are running using a SAN has a storage

    C) The bellow is from hmc.log on node logproc-name-1

    2012:11:09 06:29:47][WARN][PuppetInvoker][PuppetInvoker.php:344][waitForResults]: Kick timed out, waited 120 seconds
    [2012:11:09 06:29:48][INFO][PuppetInvoker][PuppetInvoker.php:292][genKickWait]: Kick attempt (2/3)
    [2012:11:09 06:29:49][INFO][PuppetInvoker][PuppetInvoker.php:97][sendKick]: logproc-name-1 previous kick still running, will continue to wait
    [2012:11:09 06:29:49][INFO][PuppetInvoker][PuppetInvoker.php:332][waitForResults]: Waiting for results from logproc-name-1.highwire.org
    [2012:11:09 06:29:49][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-58-57
    [2012:11:09 06:30:54][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 1 out of 1 nodes have reported for txn 3-58-57
    [2012:11:09 06:30:55][INFO][PuppetInvoker][PuppetInvoker.php:237][createGenKickWaitResponse]: Response of genKickWait:
    Array
    (
    [result] => 0
    [error] =>
    [nokick] => Array
    (
    )

    [failed] => Array
    (
    [0] => logproc-name-1
    )

    [success] => Array
    (

    Your help would be much appreciated.

    Shyam Kadari

    Collapse
    #11967

    I generate key by using following command
    ssh-keygen

    i gave file path for private key file for root = /root/.ssh/id_rsa.pub
    for Host File =hostname.txt in this file i gave hostname

    Still i got an error as follows

    Failed. Reason: Permission denied, please try again.
    Permission denied, please try again.
    Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

    please help me any one

    Regards,
    T Chenna krishna

    Collapse
    #11966

    Robert
    Participant

    Hi Shyam,
    You need to uninstall puppet on the node where hmc is installed. You can simply run a ‘yum erase puppet’ command and that should wipe out the puppet along with hmc since it depends on it. After doing that, install hmc ‘yum install hmc’ and also make sure to the times are sync on all the machines. Ideally to do this, you can enable the ntp service on your nodes so they will all sync up. Once done and verified all the times are the same, go ahead and execute ‘service hmc start’ and try to install again.

    HTH

    Robert

    Collapse
    #11963

    Shyam Kadari
    Member

    The orignial problem that led me to uninstall is because of the following puppet errors:

    Thu Nov 08 12:59:25 -0800 2012 Puppet (notice): Using cached catalog
    Thu Nov 08 12:59:25 -0800 2012 Puppet (debug): catalog supports formats: b64_zlib_yaml dot marshal pson raw yaml; using pson
    Thu Nov 08 12:59:25 -0800 2012 Puppet (err): Could not retrieve catalog from remote server: certificate verify failed. This is often because the time is out of sync on the server or client
    Thu Nov 08 12:59:25 -0800 2012 Puppet (info): Not using expired catalog for logproc-data-3.highwire.org from cache; expired at Wed Nov 07 17:59:33 -0800 2012
    Thu Nov 08 12:59:25 -0800 2012 Puppet (notice): Using cached catalog
    Thu Nov 08 12:59:25 -0800 2012 Puppet (err): Could not retrieve catalog; skipping run
    Thu Nov 08 12:59:25 -0800 2012 Puppet (debug): Value of ‘preferred_serialization_format’ (pson) is invalid for report, using default (marshal)
    Thu Nov 08 12:59:25 -0800 2012 Puppet (debug): report supports formats: b64_zlib_yaml marshal raw yaml; using marshal
    Thu Nov 08 12:59:25 -0800 2012 Puppet (err): Could not send report: certificate verify failed. This is often because the time is out of sync on the server or client

    Shyam Kadari

    Collapse
Viewing 9 replies - 1 through 9 (of 9 total)