HDP on Linux – Installation Forum

Ganglia server/monitor Down

  • #28082
    Ardavan Moinzadeh
    Participant

    I had a cluster of 3 nodes running perfectly with no issues (all services running).. I added 3 new nodes to my cluster following all on-screen instruction (Including restarting Ganglia in order to recognize the new 3 nodes) …now Ganglia is down in all of my 6 nodes and there is no way I can bring them back on! I tried turning them on through ambari , they were green just for few seconds and went back to orange(slaves) & Red (Master node) in seconds.. I also tried to force restarting Ganglia from each node by typing the following command : ” service gmond restart” although it says Ganglia service is restarted but it is not taking into effect…
    Can someone please help me out here? I am out of ideas!
    best regards,
    Ardavan

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #28191
    Sasha J
    Moderator

    Did you reboot your nodes?
    Stop it from Ambari if it enabled.
    If not, go to each node and do “/etc/init.d/hdp-gmetad stop” on Ganglia server and “/etc/init.d/gmond stop” on all nodes.
    On server, also do:
    /etc/init.d/gmetad stop
    chkconfig gmetad off

    On all nodes, also do:
    /etc/init.d/gmond stop
    chkconfig gmond off

    Then go back to Ambari and start Ganglia from there.

    Thank you!
    Sasha

    #28378
    Ardavan Moinzadeh
    Participant

    Sasha,
    Thank you for your respond, Interestingly only the first 3 nodes that I started my cluster with are now back to green,however the 3 new nodes are still Orange(Down).

    I tried to shut down hdp-gmond on those 3 new nodes and got this error:
    [X@b-0014 ~]$ sudo service hdp-gmond status
    =======================================
    Checking status of hdp-gmond…
    =======================================
    Failed to find running /usr/sbin/gmond for cluster HDPSlaves
    [X@b-0014 ~]$ sudo service hdp-gmond status
    =======================================
    Checking status of hdp-gmond…
    =======================================
    Failed to find running /usr/sbin/gmond for cluster HDPSlaves

    Now that the fisrt 3, specially the master node is up I think we are getting somewhere..:P any other suggestions?

    Thank you

    #28412
    Seth Lyubich
    Moderator

    Hi Ardavan,

    Please check to make sure that service gmond is not running on the nodes as Sasha suggested in last post. Please check:

    On all nodes, also do:
    /etc/init.d/gmond stop
    chkconfig gmond off

    Once this done you can try to restart Ganglia.

    Hope this helps,

    Thanks,
    Seth

    #28413
    Ardavan Moinzadeh
    Participant

    Sef,
    As I mentioned before only the original 3 nodes came up after I ran those commands.. for some reason the second group that I added to the cluster are not affected at all. I test this several times!

    #28528
    Sasha J
    Moderator

    Try to regenerate Ganglia configurations for those 3 new nodes.
    Check here: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.0/bk_hdp1-system-admin-guide/content/admin_add-nodes-4.html
    for more information.

    Thank you!
    Sasha

    #28539
    Ardavan Moinzadeh
    Participant

    Sasha, I followed all the steps on that link and I get this error on the last one,
    ===> scp root@$existing_node:/etc/init.d/hdp-gmond /etc/init.d
    /etc/init.d/hdp-gmond start

    start: No such file or directory!

    #28540
    Ardavan Moinzadeh
    Participant

    I fixed the error! still not coming up!

    #28543
    Ardavan Moinzadeh
    Participant

    Any hardware information such as Memory, Local Disk, and CPU is also not available on this new nodes. Is there a way to remove these nodes from my cluster and restart the process again?

    Thank you

    #28558
    Robert
    Participant

    Hi Ardavan,
    At the moment, Ambari supports decomissioning of nodes, but not removing that node from the cluster completely (uninstalling hadoop components and updating the ambari database that is no longer recognized) Can you provide output of rpm -qa | grep ganglia on the 3 nodes having the issue and verify those rpm packages installed match up with the ones that are currently working as expected?

    Regards,
    Robert

    #28572
    Ardavan Moinzadeh
    Participant

    Robert,
    this is the result:
    libganglia-3.2.0-99.x86_64
    ganglia-gmond-3.2.0-99.x86_64
    Thank you

    #28583
    Ardavan Moinzadeh
    Participant

    One more question:
    After installing Ambari, Where are gmond and gmetad log files located?
    Thank you

    #28590
    Yi Zhang
    Moderator

    Hi Ardavan,

    HDP only installs ganglia plugins, does not change the ganglia settings. Usually ganglia logs are in /var/log/messages.
    This post is helpful.
    http://sourceforge.net/apps/trac/ganglia/wiki/FAQ

    Thanks,
    Yi

    #28624
    Ardavan Moinzadeh
    Participant

    Yi,

    when I try to check the data on gmond
    /usr/bin/gstat -a i get the following error:
    ” Unable to get hostlist from localhost 8649!”
    what is causing this error?

    #28636
    tedr
    Moderator

    Hi Ardavan,

    When logged into the shell on any of these nodes that don’t have ganglia working what do you get when you run the command:
    ps -eaf | grep gmond

    or

    ps-eaf|grep gmetad

    ?

    Thanks,
    Ted.

    #28656
    Ardavan Moinzadeh
    Participant

    Ted, this is the result when I run ps-eaf |grep gmond on one of the down nodes:
    myusername 44535 44506 0 09:36 pts/0 00:00:00 grep gmond

    and when I run the gmetad command:
    myusername 44619 44506 0 09:38 pts/0 00:00:00 grep gmetad

    #28657
    Ardavan Moinzadeh
    Participant

    Ted, this is the result when I run ps-eaf |grep gmond on one of the down nodes:
    myusername 44535 44506 0 09:36 pts/0 00:00:00 grep gmond

    and when I run the gmetad command:
    myusername 44619 44506 0 09:38 pts/0 00:00:00 grep gmetad

    Thank you

    #28688
    tedr
    Moderator

    Hi Ardavan,

    Can you look through the /var/log/httpd/error_log file to see if there is a hint as to why ganglia isn’t starting?

    Thanks,
    Ted,

    #28698
    Ardavan Moinzadeh
    Participant

    Ted,
    here are some of the logs :
    error_log-20130630:
    ———————————————-
    [Sun Jun 23 03:17:02 2013] [notice] Digest: generating secret for digest authentication …
    [Sun Jun 23 03:17:02 2013] [notice] Digest: done
    [Sun Jun 23 03:17:02 2013] [warn] mod_wsgi: Compiled for Python/2.6.2.
    [Sun Jun 23 03:17:02 2013] [warn] mod_wsgi: Runtime using Python/2.6.6.
    [Sun Jun 23 03:17:02 2013] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.4.14 mod_ssl/2.2.15 OpenSSL/1.0.0-fips mod_wsgi/3.2 Python/2.6.6 mod_perl/2.0.4 Perl/v5.10.1 configured — resuming normal operations
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 395
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 412
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 395
    [Mon Jun 24 11:50:21 2013] [error] [client 15.195.201.86] PHP Warning: gettimeofday(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_
    default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone ‘UTC’ for now, but ple
    ase set date.timezone to select your timezone. in /var/www/html/ganglia/ganglia.php on line 412
    [Mon Jun 24 2013] [error] [client 15.195.201.86] PHP Warning: date(): It is not safe to rely on the system’s timezone settings. You are *required* to use the date.timezone setting or the date_default_
    timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled

    #28700
    Ardavan Moinzadeh
    Participant

    part 2===access_log_20130630:

    [28/Jun/2013:10:55:24 -0500] “GET /ganglia/graph.php?c=HDPSlaves&h=bddec3v2-0015&v=0&m=jvm.metrics.logWarn&r=hour&z=small&jr=&js=&st=1372434919 HTTP/1.1″ 200 7347 “http://X.X.X/ganglia
    /?r=hour&cs=&ce=&m=load_one&s=by+name&c=HDPSlaves&h=bddec3v2-0015&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4″ “Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; BOIE9;ENUS)”
    ==============================

    #28703
    Ardavan Moinzadeh
    Participant

    Also when I try to start hdp-gmond : sudo service hdp-gmond start , I get the follwoing error:
    “Failed to find running /usr/sbin/gmond for cluster HDPSlaves”

    Thanks

    #28707
    tedr
    Moderator

    Hi Ardavan,

    Could you post the complete logs to http://ftp.hortonworks.com username=dropoff password=hadoop.

    Thanks,
    Ted.

    #28709
    Ardavan Moinzadeh
    Participant

    Ted,
    I uploaded the files : “error_log-20130630″ “ssl_error_log-20130630″ & “access_log-20130630″
    Thanks

    #28711
    tedr
    Moderator

    Hi Ardavan,

    Thanks, I’ll give them a look over and get back to you.

    Thanks,
    Ted.

    #29059
    Ardavan Moinzadeh
    Participant

    Anything from the log files?

    #29073
    Sasha J
    Moderator

    Ardavan,
    http logs does not have anything in common with Ganglia not showing graphs correctly…

    Please, check if hdp-gmetad running on your Ganglia server and hep-gmond running on all nodes.
    Also, check folder /etc/ganglia/hdp on all nodes.
    It should show something like this:
    [root@bimota2 hdp]# ls -l
    total 24
    -rw-r–r– 1 root root 5877 Jul 1 15:46 gmetad.conf
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPHBaseMaster
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPJobTracker
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPNameNode
    drwxr-xr-x 3 root hadoop 4096 Jun 12 18:24 HDPSlaves
    [root@bimota2 hdp]#

    Also, check for “chkconfig | grep gmond” on all nodes, it should show:
    [root@bimota3 ~]# chkconfig | grep gmond
    gmond 0:off 1:off 2:off 3:off 4:off 5:off 6:off
    [root@bimota3 ~]#

    Same should be for gmetad on Ganglia server node.

    Get back to us with the information.

    Thank you!
    Sasha

    #29131
    Ardavan Moinzadeh
    Participant

    Sasha Here are the results that I got following your suggestions:
    /etc/ganglia/hdp : ls -l
    Master node:
    -rw-r–r–. 1 root root 5901 Jul 2 16:12 gmetad.conf
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:43 HDPHBaseMaster
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:44 HDPJobTracker
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:43 HDPNameNode
    drwxr-xr-x. 3 root hadoop 4096 Jun 28 11:27 HDPSlaves
    Slave nodes:
    Slave 1:total 4
    drwxr-xr-x. 3 root root 4096 Jun 10 14:44 HDPSlaves
    Slave 2: total 8
    drwxr-xr-x. 3 root hadoop 4096 Jun 10 14:45 HDPJobTracker
    drwxr-xr-x. 3 root root 4096 Jun 10 14:45 HDPSlaves

    Slave 3: total 4
    drwxr-xr-x. 3 root root 4096 Jun 17 15:58 HDPSlaves

    Slave 4: total 4
    drwxr-xr-x. 3 root root 4096 Jun 10 14:44 HDPSlaves

    Slave 5:total 4
    drwxr-xr-x. 3 root root 4096 Jun 17 15:57 HDPSlaves

    and this is what I get when I run this command chkconfig |grep gmetad
    chkconfig | grep gmond
    gmond 0:off 1:off 2:off 3:off 4:off 5:off 6:off

    Thanks

    #29132
    Ardavan Moinzadeh
    Participant

    As a reminder for you guys, the first 3 nodes that I started my cluster have ganglia up and running , only the group of 3 that I added later Ganglia is down

    #29159
    tedr
    Moderator

    Hi Ardavan,

    On the three nodes where Ganglis is down what do you get when you run ‘ps -eaf|grep gmond’?

    Thanks,
    Ted.

    #29160
    Ardavan Moinzadeh
    Participant

    Ted,
    This is what I get:
    User 31839 31802 0 15:59 pts/0 00:00:00 grep gmond
    User 46653 46628 0 15:59 pts/0 00:00:00 grep gmond
    User 21220 21183 0 15:59 pts/0 00:00:00 grep gmond

    #29220
    tedr
    Moderator

    Hi Ardavan,

    The only possible hint I see in the logs is that the time on the three new nodes might not be in sync with the rest.

    Thanks,
    Ted.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.