Home Forums HDP on Linux – Installation Ganglia server/monitor Down

This topic contains 30 replies, has 6 voices, and was last updated by  tedr 1 year, 3 months ago.

  • Creator
    Topic
  • #28082

    Ardavan Moinzadeh
    Participant

    I had a cluster of 3 nodes running perfectly with no issues (all services running).. I added 3 new nodes to my cluster following all on-screen instruction (Including restarting Ganglia in order to recognize the new 3 nodes) …now Ganglia is down in all of my 6 nodes and there is no way I can bring them back on! I tried turning them on through ambari , they were green just for few seconds and went back to orange(slaves) & Red (Master node) in seconds.. I also tried to force restarting Ganglia from each node by typing the following command : ” service gmond restart” although it says Ganglia service is restarted but it is not taking into effect…
    Can someone please help me out here? I am out of ideas!
    best regards,
    Ardavan

Viewing 30 replies - 1 through 30 (of 30 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #29220

    tedr
    Moderator

    Hi Ardavan,

    The only possible hint I see in the logs is that the time on the three new nodes might not be in sync with the rest.

    Thanks,
    Ted.

    Collapse
    #29160

    Ardavan Moinzadeh
    Participant

    Ted,
    This is what I get:
    User 31839 31802 0 15:59 pts/0 00:00:00 grep gmond
    User 46653 46628 0 15:59 pts/0 00:00:00 grep gmond
    User 21220 21183 0 15:59 pts/0 00:00:00 grep gmond

    Collapse
    #29159

    tedr
    Moderator

    Hi Ardavan,

    On the three nodes where Ganglis is down what do you get when you run ‘ps -eaf|grep gmond’?

    Thanks,
    Ted.

    Collapse
    #29132

    Ardavan Moinzadeh
    Participant

    As a reminder for you guys, the first 3 nodes that I started my cluster have ganglia up and running , only the group of 3 that I added later Ganglia is down

    Collapse
    #29131

    Ardavan Moinzadeh
    Participant

    Sasha Here are the results that I got following your suggestions:
    /etc/ganglia/hdp : ls -l
    Master node:
    -rw-r–r–. 1 root root 5901 Jul 2 16:12 gmetad.conf
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:43 HDPHBaseMaster
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:44 HDPJobTracker
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:43 HDPNameNode
    drwxr-xr-x. 3 root hadoop 4096 Jun 28 11:27 HDPSlaves
    Slave nodes:
    Slave 1:total 4
    drwxr-xr-x. 3 root root 4096 Jun 10 14:44 HDPSlaves
    Slave 2: total 8
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:45 HDPJobTracker
    drwxr-xr-x. 3 root root 4096 Jun 10 14:45 HDPSlaves

    Slave 3: total 4
    drwxr-xr-x. 3 root root 4096 Jun 17 15:58 HDPSlaves

    Slave 4: total 4
    drwxr-xr-x. 3 root root 4096 Jun 10 14:44 HDPSlaves

    Slave 5:total 4
    drwxr-xr-x. 3 root root 4096 Jun 17 15:57 HDPSlaves

    and this is what I get when I run this command chkconfig |grep gmetad
    chkconfig | grep gmond
    gmond 0:off 1:off 2:off 3:off 4:off 5:off 6:off

    Thanks

    Collapse
    #29073

    Sasha J
    Moderator

    Ardavan,
    http logs does not have anything in common with Ganglia not showing graphs correctly…

    Please, check if hdp-gmetad running on your Ganglia server and hep-gmond running on all nodes.
    Also, check folder /etc/ganglia/hdp on all nodes.
    It should show something like this:
    [root@bimota2 hdp]# ls -l
    total 24
    -rw-r–r– 1 root root 5877 Jul 1 15:46 gmetad.conf
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPHBaseMaster
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPJobTracker
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPNameNode
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPSlaves
    [root@bimota2 hdp]#

    Also, check for “chkconfig | grep gmond” on all nodes, it should show:
    [root@bimota3 ~]# chkconfig | grep gmond
    gmond 0:off 1:off 2:off 3:off 4:off 5:off 6:off
    [root@bimota3 ~]#

    Same should be for gmetad on Ganglia server node.

    Get back to us with the information.

    Thank you!
    Sasha

    Collapse
    #29059

    Ardavan Moinzadeh
    Participant

    Anything from the log files?

    Collapse
    #28711

    tedr
    Moderator

    Hi Ardavan,

    Thanks, I’ll give them a look over and get back to you.

    Thanks,
    Ted.

    Collapse
    #28709

    Ardavan Moinzadeh
    Participant

    Ted,
    I uploaded the files : “error_log-20130630″ “ssl_error_log-20130630″ & “access_log-20130630″
    Thanks

    Collapse
    #28707

    tedr
    Moderator

    Hi Ardavan,

    Could you post the complete logs to http://ftp.hortonworks.com username=dropoff password=hadoop.

    Thanks,
    Ted.

    Collapse
    #28703

    Ardavan Moinzadeh
    Participant

    Also when I try to start hdp-gmond : sudo service hdp-gmond start , I get the follwoing error:
    “Failed to find running /usr/sbin/gmond for cluster HDPSlaves”

    Thanks

    Collapse
    #28700

    Ardavan Moinzadeh
    Participant

    part 2===access_log_20130630:

    [28/Jun/2013:10:55:24 -0500] “GET /ganglia/graph.php?c=HDPSlaves&h=bddec3v2-0015&v=0&m=jvm.metrics.logWarn&r=hour&z=small&jr=&js=&st=1372434919 HTTP/1.1″ 200 7347 “http://X.X.X/ganglia
    /?r=hour&cs=&ce=&m=load_one&s=by+name&c=HDPSlaves&h=bddec3v2-0015&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4″ “Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; BOIE9;ENUS)”
    ==============================

    Collapse
    #28698

    Ardavan Moinzadeh
    Participant

    Ted,
    here are some of the logs :
    error_log-20130630:
    ———————————————-
    [Sun Jun 23 03:17:02 2013] [notice] Digest: generating secret for digest authentication …
    [Sun Jun 23 03:17:02 2013] [notice] Digest: done
    [Sun Jun 23 03:17:02 2013] [warn] mod_wsgi: Compiled for Python/2.6.2.
    [Sun Jun 23 03:17:02 2013] [warn] mod_wsgi: Runtime using Python/2.6.6.
    [Sun Jun 23 03:17:02 2013] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.4.14 mod_ssl/2.2.15 OpenSSL/1.0.0-fips mod_wsgi/3.2 Python/2.6.6 mod_perl/2.0.4 Perl/v5.10.1 configured — resuming normal operations
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 395
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 412
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 395
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 412
    [Mon Jun 24 2013] [error] [client 15.195.201.86] PHP Warning: date(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_default_
    timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled

    Collapse
    #28688

    tedr
    Moderator

    Hi Ardavan,

    Can you look through the /var/log/httpd/error_log file to see if there is a hint as to why ganglia isn’t starting?

    Thanks,
    Ted,

    Collapse
    #28657

    Ardavan Moinzadeh
    Participant

    Ted, this is the result when I run ps-eaf |grep gmond on one of the down nodes:
    myusername 44535 44506 0 09:36 pts/0 00:00:00 grep gmond

    and when I run the gmetad command:
    myusername 44619 44506 0 09:38 pts/0 00:00:00 grep gmetad

    Thank you

    Collapse
    #28656

    Ardavan Moinzadeh
    Participant

    Ted, this is the result when I run ps-eaf |grep gmond on one of the down nodes:
    myusername 44535 44506 0 09:36 pts/0 00:00:00 grep gmond

    and when I run the gmetad command:
    myusername 44619 44506 0 09:38 pts/0 00:00:00 grep gmetad

    Collapse
    #28636

    tedr
    Moderator

    Hi Ardavan,

    When logged into the shell on any of these nodes that don’t have ganglia working what do you get when you run the command:
    ps -eaf | grep gmond

    or

    ps-eaf|grep gmetad

    ?

    Thanks,
    Ted.

    Collapse
    #28624

    Ardavan Moinzadeh
    Participant

    Yi,

    when I try to check the data on gmond
    /usr/bin/gstat -a i get the following error:
    ” Unable to get hostlist from localhost 8649!”
    what is causing this error?

    Collapse
    #28590

    Yi Zhang
    Moderator

    Hi Ardavan,

    HDP only installs ganglia plugins, does not change the ganglia settings. Usually ganglia logs are in /var/log/messages.
    This post is helpful.

    http://sourceforge.net/apps/trac/ganglia/wiki/FAQ

    Thanks,
    Yi

    Collapse
    #28583

    Ardavan Moinzadeh
    Participant

    One more question:
    After installing Ambari, Where are gmond and gmetad log files located?
    Thank you

    Collapse
    #28572

    Ardavan Moinzadeh
    Participant

    Robert,
    this is the result:
    libganglia-3.2.0-99.x86_64
    ganglia-gmond-3.2.0-99.x86_64
    Thank you

    Collapse
    #28558

    Robert
    Participant

    Hi Ardavan,
    At the moment, Ambari supports decomissioning of nodes, but not removing that node from the cluster completely (uninstalling hadoop components and updating the ambari database that is no longer recognized) Can you provide output of rpm -qa | grep ganglia on the 3 nodes having the issue and verify those rpm packages installed match up with the ones that are currently working as expected?

    Regards,
    Robert

    Collapse
    #28543

    Ardavan Moinzadeh
    Participant

    Any hardware information such as Memory, Local Disk, and CPU is also not available on this new nodes. Is there a way to remove these nodes from my cluster and restart the process again?

    Thank you

    Collapse
    #28540

    Ardavan Moinzadeh
    Participant

    I fixed the error! still not coming up!

    Collapse
    #28539

    Ardavan Moinzadeh
    Participant

    Sasha, I followed all the steps on that link and I get this error on the last one,
    ===> scp root@$existing_node:/etc/init.d/hdp-gmond /etc/init.d
    /etc/init.d/hdp-gmond start

    start: No such file or directory!

    Collapse
    #28528

    Sasha J
    Moderator

    Try to regenerate Ganglia configurations for those 3 new nodes.
    Check here: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.0/bk_hdp1-system-admin-guide/content/admin_add-nodes-4.html
    for more information.

    Thank you!
    Sasha

    Collapse
    #28413

    Ardavan Moinzadeh
    Participant

    Sef,
    As I mentioned before only the original 3 nodes came up after I ran those commands.. for some reason the second group that I added to the cluster are not affected at all. I test this several times!

    Collapse
    #28412

    Seth Lyubich
    Keymaster

    Hi Ardavan,

    Please check to make sure that service gmond is not running on the nodes as Sasha suggested in last post. Please check:

    On all nodes, also do:
    /etc/init.d/gmond stop
    chkconfig gmond off

    Once this done you can try to restart Ganglia.

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #28378

    Ardavan Moinzadeh
    Participant

    Sasha,
    Thank you for your respond, Interestingly only the first 3 nodes that I started my cluster with are now back to green,however the 3 new nodes are still Orange(Down).

    I tried to shut down hdp-gmond on those 3 new nodes and got this error:
    [X@b-0014 ~]$ sudo service hdp-gmond status
    =======================================
    Checking status of hdp-gmond…
    =======================================
    Failed to find running /usr/sbin/gmond for cluster HDPSlaves
    [X@b-0014 ~]$ sudo service hdp-gmond status
    =======================================
    Checking status of hdp-gmond…
    =======================================
    Failed to find running /usr/sbin/gmond for cluster HDPSlaves

    Now that the fisrt 3, specially the master node is up I think we are getting somewhere..:P any other suggestions?

    Thank you

    Collapse
    #28191

    Sasha J
    Moderator

    Did you reboot your nodes?
    Stop it from Ambari if it enabled.
    If not, go to each node and do “/etc/init.d/hdp-gmetad stop” on Ganglia server and “/etc/init.d/gmond stop” on all nodes.
    On server, also do:
    /etc/init.d/gmetad stop
    chkconfig gmetad off

    On all nodes, also do:
    /etc/init.d/gmond stop
    chkconfig gmond off

    Then go back to Ambari and start Ganglia from there.

    Thank you!
    Sasha

    Collapse
Viewing 30 replies - 1 through 30 (of 30 total)