Home Forums HDP on Linux – Installation Install, Start and Test stage fails

This topic contains 6 replies, has 3 voices, and was last updated by  Sasha J 1 year, 1 month ago.

  • Creator
    Topic
  • #30013

    kaya_roti
    Member

    Hi all,
    During Cluster Install Wizard, at the Install, Start and Test stage, I received an error saying “Failed to install/start the services”
    I went to my ambari-server.log and it shows:
    16:24:10,033 WARN HeartbeatMonitor: 123 – Heartbeat lost from host slave2
    16:24:10,175 WARN HeartbeatMonitor: 123 – Heartbeat lost from host slave1

    How can I solve the heartbeat issue???

    Cheers,
    kaya_roti

Viewing 6 replies - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #30913

    Sasha J
    Moderator

    first of all, in order to get heartbeat, ambari-agent have to be started on all nodes, please check if it all running.

    Thank you!
    Sasha

    Collapse
    #30869

    kaya_roti
    Member

    Hi,

    I would like to know how to check if all nodes are in sync. I have already configured all the nodes to connect to a dedicated NTP server. I only received heartbeat from one node in the first place. But after waiting for a long time, I finally received heartbeat from one more node. I still cannot received heartbeat from the rest of my 2 nodes. The configurations on all my nodes are the same but why some nodes are working, some are not.
    I would also like to know how do you fix the time between the nodes.

    Thanks alot,
    kaya

    Collapse
    #30669

    Sasha J
    Moderator

    You have to fix your time between the nodes, and you also need to make sure your firewalls are stopped and nodes can communicate to each other with no problems.

    Thank you!
    Sasha

    Collapse
    #30652

    kaya_roti
    Member

    Hi Ted,

    I managed to installed all components successfully on my first try thanks to your advice. However, when I tried the second time, I still received the same error again. I did as you told and ensure that the time for all nodes are connected to the same ntp server. Stilling telling me the heartbeat lost. Any help would be nice.

    ntpstat:
    ambarimaster – synchronised to NTP server (54.251.61.122) at stratum 4 time correct to within 258ms polling server every 64s
    slave1 – synchronised to NTP server (54.251.61.122) at stratum 4 time correct to within 101ms polling server every 64s
    client1 – synchronised to NTP server (54.251.61.122) at stratum 4 time correct to within 213ms polling server every 64s

    Collapse
    #30033

    tedr
    Moderator

    Hi Kaya,

    Please make sure that NTP is installed, enabled and the time is in sync on all nodes in your cluster.

    thanks,
    Ted.

    Collapse
    #30014

    kaya_roti
    Member

    Here is a more detailed log:
    16:16:50,919 INFO ActionScheduler:309 – Host:slave1, role:DATANODE, actionId:7-1 timed out
    16:16:51,194 WARN ActionScheduler:312 – Host:slave1, role:DATANODE, actionId:7-1 expired
    16:16:51,911 INFO ActionScheduler:309 – Host:slave1, role:GANGLIA_MONITOR, actionId:7-1 timed out
    16:16:51,911 WARN ActionScheduler:312 – Host:slave1, role:GANGLIA_MONITOR, actionId:7-1 expired
    16:16:51,977 INFO ActionScheduler:309 – Host:slave1, role:GANGLIA_SERVER, actionId:7-1 timed out
    16:16:51,978 WARN ActionScheduler:312 – Host:slave1, role:GANGLIA_SERVER, actionId:7-1 expired
    16:16:52,174 INFO ActionScheduler:309 – Host:slave1, role:HBASE_CLIENT, actionId:7-1 timed out
    16:16:52,175 WARN ActionScheduler:312 – Host:slave1, role:HBASE_CLIENT, actionId:7-1 expired
    16:16:52,462 INFO ActionScheduler:309 – Host:slave1, role:HBASE_MASTER, actionId:7-1 timed out
    16:16:52,614 WARN ActionScheduler:312 – Host:slave1, role:HBASE_MASTER, actionId:7-1 expired
    16:16:53,107 INFO ActionScheduler:309 – Host:slave1, role:HBASE_REGIONSERVER, actionId:7-1 timed out
    16:16:53,108 WARN ActionScheduler:312 – Host:slave1, role:HBASE_REGIONSERVER, actionId:7-1 expired
    16:16:53,941 INFO ActionScheduler:309 – Host:slave1, role:HCAT, actionId:7-1 timed out
    16:16:53,942 WARN ActionScheduler:312 – Host:slave1, role:HCAT, actionId:7-1 expired
    16:16:54,492 INFO ActionScheduler:309 – Host:slave1, role:HDFS_CLIENT, actionId:7-1 timed out
    16:16:54,492 WARN ActionScheduler:312 – Host:slave1, role:HDFS_CLIENT, actionId:7-1 expired
    16:16:55,317 INFO ActionScheduler:309 – Host:slave1, role:HIVE_CLIENT, actionId:7-1 timed out
    16:16:55,486 WARN ActionScheduler:312 – Host:slave1, role:HIVE_CLIENT, actionId:7-1 expired
    16:16:55,781 INFO ActionScheduler:309 – Host:slave1, role:MAPREDUCE_CLIENT, actionId:7-1 timed out
    16:16:55,782 WARN ActionScheduler:312 – Host:slave1, role:MAPREDUCE_CLIENT, actionId:7-1 expired
    16:16:56,033 INFO ActionScheduler:309 – Host:slave1, role:NAMENODE, actionId:7-1 timed out
    16:16:56,034 WARN ActionScheduler:312 – Host:slave1, role:NAMENODE, actionId:7-1 expired
    16:16:56,592 INFO ActionScheduler:309 – Host:slave1, role:OOZIE_CLIENT, actionId:7-1 timed out
    16:16:56,593 WARN ActionScheduler:312 – Host:slave1, role:OOZIE_CLIENT, actionId:7-1 expired
    16:16:56,920 INFO ActionScheduler:309 – Host:slave1, role:PIG, actionId:7-1 timed out
    16:16:56,920 WARN ActionScheduler:312 – Host:slave1, role:PIG, actionId:7-1 expired
    16:16:57,471 INFO ActionScheduler:309 – Host:slave1, role:SQOOP, actionId:7-1 timed out
    16:16:57,471 WARN ActionScheduler:312 – Host:slave1, role:SQOOP, actionId:7-1 expired
    16:16:57,557 INFO ActionScheduler:309 – Host:slave1, role:TASKTRACKER, actionId:7-1 timed out
    16:16:57,558 WARN ActionScheduler:312 – Host:slave1, role:TASKTRACKER, actionId:7-1 expired
    16:16:58,243 INFO ActionScheduler:309 – Host:slave1, role:ZOOKEEPER_CLIENT, actionId:7-1 timed out

    Collapse
Viewing 6 replies - 1 through 6 (of 6 total)