Home Forums HDP on Linux – Installation HMC Failure to Start after restarting CentOS VM

This topic contains 8 replies, has 5 voices, and was last updated by  Venkatavaradhan Viswanathan 1 year, 10 months ago.

  • Creator
    Topic
  • #8625

    John Edwards
    Participant

    Hi,
    After a somewhat complicated install, many many attempts each time uncovering one of the bugs you have posted replies to in the past (be good if some of these known bugs had a single place that you could read them from, e.g. the install documentation); anyway I had to do a reboot after having the system running, all was good but after rebooting nothing functioned. I did read through many other posts and tried the hmc and hmc-agent service start tip but this did not work so I did the remove hmc and reinstall; this brought the hmc back but unfortunatly it does not recognise that the rest of the system is installed just not running. My question was and is how do I trick the hmc to skip all the install stuff since it is already installed and just go to the service start screen? You asked for more information, what info would you like I’m not sure as to what to provide and if any of it will be useful anymore since I have reinstalled the hmc.

    regards,

    John.

Viewing 8 replies - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #12996

    Thanks Ted.

    Collapse
    #12993

    tedr
    Member

    HI Venkatavaradhan,

    Thanks for using HDP.

    Here you go. To stop manually (which must be done if manual start is used) simple replace “start” with “stop.” NOTE: these commands must be run on the node on which the daemons are installed.

    Thanks,
    Ted.

    To start the name node
    su – hdfs -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start namenode

    To start data nodes
    su – hdfs -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start datanode

    To Start the secondary name node
    su – hdfs -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start secondarynamenode

    To start job tracker
    su – mapred -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start jobtracker

    To start the history server
    su – mapred -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start historyserver

    To start task trackers
    su mapred -c /usr/lib/hadoop/bin/hadoop-daemon.sh –config /etc/hadoop/conf start tasktracker

    To start zookeeper nodes
    su – zookeeper -c source /etc/zookeeper/conf/zookeeper-env.sh ; /bin/env ZOOCFGDIR=/etc/zookeeper/conf ZOOCFG=zoo.cfg /usr/lib/zookeeper/bin/zkServer.sh start

    To start the hbase master
    su – hbase -c “/usr/lib/hbase/bin/hbase-daemon.sh –config /etc/hbase/conf start master”

    To start hbase regionservers
    su – hbase -c “/usr/lib/hbase/bin/hbase-daemon.sh –config /etc/hbase/conf start regionserver”

    To start the hcat server
    /etc/init.d/mysqld start
    su – hive -c env HADOOP_HOME=/usr nohup hive –service metastore > /var/log/hive/hive.out 2> /var/log/hive/hive.log &

    To start templeton server
    su – templeton -c /usr/sbin/templeton_server.sh start

    To start Oozie
    su – oozie -c cd /var/log/oozie; /usr/lib/oozie/bin/oozie-start.sh

    Collapse
    #12985

    Your responses have really helped me so far. You have mentioned in this thread that you have two options, either reinstall hmc or manually start the processes and forget hmc. Can you tell me how to manually start the processes? It would help every one and also complete this thread fully.

    Collapse
    #9054

    Sasha J
    Moderator

    Sean,
    Here is the problem:
    HMC write file /var/run/hadoop/hdfs/namenode-formatted
    CentOS 6 deletes all files under /var/run during the startup and as a fresult, HMC thinks namenode is fresh and need to be formatted, but location is not empty, so format failed and HMC detects this failure and sto executing next commands.
    Workaround is straingt forward:
    add the following line to the end os “start” portion of /etc/init.d/hmc script:
    touch /var/run/hadoop/hdfs/namenode-formatted

    This will fix the problem and HMC will be ab le to start services normally after reboot.

    Thank you!
    Sasha

    Collapse
    #9053

    Sean Perry
    Member

    Sasha,
    You mention that there is a known problem and fix for CentOS 6.3. Can you let me know what the workaround is? My cluster is up and running, and I have running VM snapshots, but it would be helpful to be able to reboot the VM’s when necessary.

    Thanks!
    sean

    Collapse
    #8632

    Sasha J
    Moderator

    John,
    So, you have CentOS 6.3 and there is a known problem in it.
    Its startup sequence is slightly different from older OS (5.x) and as a result, some of the HMC related filed got removed from the system and hmc process can not be started… There is a pretty simple workaround for it, but it is too late to implement it, as you already reinstalled hmc.
    You are right on the hmc host failure side, if it lost, the whole HMC will be lost, BUT, cluster is still functional and can be used. HMC recovery procedure is not very straight forward, but it is possible and next HMC release will have significant changes in this area.

    For your current situation you have 2 choices:
    1. reinstall cluster from scratch (which means losing all your test data)
    or
    2. start processes manually and use cluster, forget about HMC

    Your choice, please let me know which direction you like to go.

    Thank you!
    Sasha

    Collapse
    #8630

    John Edwards
    Participant

    thanks, I’ve attached the material asked, but your answer pretty much confirms my thoughts. Only a full reinstall will bring it back to the way that it was unfortunately I’ll loose my test data but it is just test. The hmc service restart comment I was referring to was from one of your posts
    service hmc start
    service hmc-agent start
    Q. with respect to your last comment, “there is not way to have it working without reinstalling the whole cluster” on a single node play box this is not really an issue but I’m starting the eval process for a 100+ node cluster… doesn’t seem right that if you loose the box with the hmc then you have lost it for good. Is this correct?

    thanks for your help,

    John

    cat /etc/hosts
    ==========
    127.0.0.1 localhost.localdomain localhost
    10.211.55.15 centos.localdomain centos
    ::1 localhost6.localdomain6 localhost6

    hostname -f
    ===========
    centos.localdomain

    uname -a
    ========
    Linux CentOS.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

    cat /etc/issue
    ===========
    CentOS release 6.3 (Final)
    Kernel \r on an \m

    /etc/init.d/iptables status
    ===================
    iptables: Firewall is not running.

    sestatus
    =======
    SELinux status: enabled
    SELinuxfs mount: /selinux
    Current mode: permissive
    Mode from config file: enforcing
    Policy version: 24
    Policy from config file: targeted

    service ntpd status
    ==============
    ntpd is stopped

    nslookup `hostname -f`
    ==================
    Server: 10.211.55.1
    Address: 10.211.55.1#53

    ** server can’t find centos.localdomain: NXDOMAIN

    ls /etc/yum.repos.d
    ===============
    CentOS-Base.repo CentOS-Debuginfo.repo CentOS-Media.repo CentOS-Vault.repo epel.repo epel-testing.repo hdp.repo

    df -k
    ====
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/mapper/vg_centos-lv_root
    51606140 4948872 46133132 10% /
    tmpfs 1961156 296 1960860 1% /dev/shm
    /dev/sda1 495844 38111 432133 9% /boot
    /dev/mapper/vg_centos-lv_home
    9877432 405412 8970260 5% /home
    none 4294967296 0 4294967296 0% /media/psf

    grep fail /var/log/puppet_apply.log
    ==========================

    grep fail /var/log/hmc/hmc.log (These are all from before the system was working)
    ======================
    [2012:08:21 03:13:21][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 03:13:26][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 03:13:26][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [2012:08:21 03:26:20][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 03:26:25][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 03:26:25][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:21 04:25:57][ERROR][Cluster:hadoop][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 04:25:58][INFO][ClusterMain:TxnId=5][ClusterMain.php:353][]: Completed action=deploy on cluster=hadoop, txn=5-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 04:25:58][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345523158,”context”:{“status”:false,”txnId”:”5″}}
    [2012:08:21 04:25:58][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345523158,”context”:{“status”:false,”txnId”:”5″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:21 04:41:41][ERROR][UploadFiles][addNodes.php:95][]: Hosts file copy to loc /var/run/hmc/clusters/hadoop/hosts.txt failed
    [failed] => Array
    [2012:08:21 04:43:43][ERROR][Cluster:hadoop][Cluster.php:164][_uninstallAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 04:43:43][INFO][ClusterMain:TxnId=6][ClusterMain.php:353][]: Completed action=wipeout on cluster=hadoop, txn=6-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 04:43:44][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”UNINSTALLED”,”displayName”:”Uninstall failed”,”timeStamp”:1345524224,”context”:{“status”:false,”txnId”:”6″}}
    [2012:08:21 04:43:44][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”UNINSTALLED”,”displayName”:”Uninstall failed”,”timeStamp”:1345524224,”context”:{“status”:false,”txnId”:”6″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:21 04:47:12][ERROR][UploadFiles][addNodes.php:95][]: Hosts file copy to loc /var/run/hmc/clusters/hadoop/hosts.txt failed
    [2012:08:21 04:56:34][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 04:56:39][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 04:56:39][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:21 05:06:33][ERROR][Cluster:hadoop][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 05:06:33][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=hadoop, txn=3-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 05:06:36][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345525596,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:21 05:06:36][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345525596,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:21 08:25:20][ERROR][UploadFiles][addNodes.php:95][]: Hosts file copy to loc /var/run/hmc/clusters/hadoop/hosts.txt failed
    [2012:08:21 13:22:56][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 13:23:01][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 13:23:01][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:21 13:47:11][ERROR][Cluster:HadoopPlay][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 13:47:11][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=HadoopPlay, txn=3-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 13:47:13][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345556833,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:21 13:47:13][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345556833,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:21 14:03:10][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 14:03:15][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 14:03:15][INFO][PuppetFinalize:txnId=4:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:21 14:05:45][ERROR][Cluster:HadoopPlay][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 14:05:45][INFO][ClusterMain:TxnId=6][ClusterMain.php:353][]: Completed action=deploy on cluster=HadoopPlay, txn=6-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 14:05:46][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345557946,”context”:{“status”:false,”txnId”:”6″}}
    [2012:08:21 14:05:46][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345557946,”context”:{“status”:false,”txnId”:”6″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:21 23:51:35][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 23:51:40][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:21 23:51:40][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:21 23:53:53][ERROR][Cluster:HadoopTest][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:21 23:53:53][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=HadoopTest, txn=3-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:21 23:53:55][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345593235,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:21 23:53:55][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345593235,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 02:06:12][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=0, failedHostsCount=1
    [2012:08:22 02:06:17][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 02:06:17][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:22 02:09:06][ERROR][Cluster:HadoopTest][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:22 02:09:06][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=HadoopTest, txn=3-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:22 02:09:07][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345601347,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:22 02:09:07][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345601347,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 03:04:26][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 03:04:31][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 03:04:31][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [2012:08:22 03:25:01][ERROR][Service: HBASE (HadoopTest)][Service.php:473][smoke]: Service smoke check failed with Array
    [failed] => Array
    [2012:08:22 03:25:01][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=HadoopTest, txn=3-0-0, result=-2, error=Service HBASE is not STARTED, smoke tests failed!
    [2012:08:22 03:25:04][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345605904,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:22 03:25:04][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345605904,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 04:05:49][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]: Encountered total failure in transaction 100 while running cmd: /usr/bin/php ./addNodes/findSshableNodes.php with args: HortonWorks root 1 100 2 /var/run/hmc/clusters/HortonWorks/hosts.txt
    [2012:08:22 04:07:40][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:07:45][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:07:45][INFO][PuppetFinalize:txnId=3:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [2012:08:22 04:13:05][ERROR][ServiceComponent:NAMENODE][ServiceComponent.php:283][start]: Puppet kick failed, no successful nodes
    [2012:08:22 04:13:05][INFO][ClusterMain:TxnId=5][ClusterMain.php:353][]: Completed action=deploy on cluster=HortonWorks, txn=5-0-0, result=-3, error=Failed to start DATANODE with -3 (\’Failed to start NAMENODE with -3 (\’Puppet kick failed on all nodes\’)\’)
    [2012:08:22 04:13:05][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345608785,”context”:{“status”:false,”txnId”:”5″}}
    [2012:08:22 04:13:05][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345608785,”context”:{“status”:false,”txnId”:”5″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 04:17:07][ERROR][UploadFiles][addNodes.php:95][]: Hosts file copy to loc /var/run/hmc/clusters/HortonWorks/hosts.txt failed
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [2012:08:22 04:23:09][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:23:14][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:23:14][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [2012:08:22 04:34:56][ERROR][Cluster:HortonWorks][Cluster.php:677][_installAllServices]: Puppet kick failed, no successful nodes
    [2012:08:22 04:34:56][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=HortonWorks, txn=3-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:22 04:34:59][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345610099,”context”:{“status”:false,”txnId”:”3″}}
    [2012:08:22 04:34:59][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”DEPLOYED”,”displayName”:”Deploy failed”,”timeStamp”:1345610099,”context”:{“status”:false,”txnId”:”3″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 04:37:32][ERROR][UploadFiles][addNodes.php:95][]: Hosts file copy to loc /var/run/hmc/clusters/HortonWorks/hosts.txt failed
    [failed] => Array
    [2012:08:22 04:38:10][ERROR][Cluster:HortonWorks][Cluster.php:164][_uninstallAllServices]: Puppet kick failed, no successful nodes
    [2012:08:22 04:38:10][INFO][ClusterMain:TxnId=4][ClusterMain.php:353][]: Completed action=wipeout on cluster=HortonWorks, txn=4-0-0, result=-3, error=Puppet kick failed on all nodes
    [2012:08:22 04:38:11][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”UNINSTALLED”,”displayName”:”Uninstall failed”,”timeStamp”:1345610291,”context”:{“status”:false,”txnId”:”4″}}
    [2012:08:22 04:38:11][INFO][ClusterState][clusterState.php:40][updateClusterState]: Update Cluster State with {“state”:”UNINSTALLED”,”displayName”:”Uninstall failed”,”timeStamp”:1345610291,”context”:{“status”:false,”txnId”:”4″,”isInPostProcess”:false,”postProcessSuccessful”:true}}
    [2012:08:22 04:41:27][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:177][sign_and_verify_agent]: Puppet cert sign status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:41:32][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:256][sign_and_verify_agent]: Puppet agent ping status, totalHosts=1, succeededHostsCount=1, failedHostsCount=0
    [2012:08:22 04:41:32][INFO][PuppetFinalize:txnId=1:subTxnId=104][finalizeNodes.php:390][]: Puppet finalize, succeeded for 1 and failed for 0 of total 1 hosts
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array
    [failed] => Array

    Collapse
    #8626

    Sasha J
    Moderator

    John,
    As I stated earlier, there is not much information you provided here…
    What is your OS version?
    What do you mean by “tried the hmc and hmc-agent service start tip but this did not work” ? how it did not work? what was the error message you see?
    In any case, you can start all the processes manually, but once you removed hmc, there is not way to have it working without reinstalling the whole cluster.
    Please, be more specific on your versions, configurations, etc.
    Please, send us output from the following commands:

    cat /etc/hosts
    hostname -f
    uname -a
    cat /etc/issue
    /etc/init.d/iptables status
    sestatus
    service ntpd status
    nslookup `hostname -f`
    ls /etc/yum.repos.d
    df -k
    grep fail /var/log/puppet_apply.log
    grep fail /var/log/hmc/hmc.log

    Thank you!
    Sasha

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)