Home Forums HDP on Linux – Installation Error starting nagios

This topic contains 4 replies, has 4 voices, and was last updated by  Larry Liu 1 year, 8 months ago.

  • Creator
    Topic
  • #13208

    Trang Nguyen
    Member

    Hi,

    During hmc installation, I am getting the following error during starting nagios services. The nagios logs indicate:

    357363206] SERVICE ALERT: xdc-tst-mapre-002.openmarket.com;GANGLIA::Ganglia [gmetad] Process down;CRITICAL;SOFT;1;Connection refused
    [1357363216] SERVICE ALERT: xdc-tst-mapre-002.openmarket.com;GANGLIA::Ganglia [gmetad] Process down;CRITICAL;SOFT;2;Connection refused
    [1357363236] SERVICE ALERT: xdc-tst-mapre-002.openmarket.com;GANGLIA::Ganglia [gmetad] Process down;CRITICAL;SOFT;3;Connection refused
    [1357363246] SERVICE ALERT: xdc-tst-mapre-002.openmarket.com;GANGLIA::Ganglia [gmetad] Process down;CRITICAL;HARD;4;Connection refused
    [1357363246] SERVICE NOTIFICATION: nagiosadmin;xdc-tst-mapre-002.openmarket.com;GANGLIA::Ganglia [gmetad] Process down;CRITICAL;notify-service-by-email;Connection refused

    Ganglia processes did install and started up successfully:’
    nobody 31241 1 0 00:17 ? 00:00:00 /usr/sbin/gmond –conf=/etc/ganglia/hdp/HDPNameNode/gmond.core.conf –pid-file=/var/run/ganglia/hdp/HDPNameNode/gmond.pid
    nobody 31258 1 0 00:17 ? 00:00:00 /usr/sbin/gmond –conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf –pid-file=/var/run/ganglia/hdp/HDPHBaseMaster/gmond.pid
    nobody 31275 1 0 00:17 ? 00:00:03 /usr/sbin/gmond –conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf –pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
    nobody 31292 1 0 00:17 ? 00:00:00 /usr/sbin/gmond –conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf –pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid

    ~
    Could someone assist?

    Thanks,
    Trang

Viewing 4 replies - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #16118

    Larry Liu
    Moderator

    Can you please reinstall HDP using Ambari 1.2?

    Larry

    Collapse
    #16114

    27p 92g
    Member

    I had a similar problem, I was seeing these messages:

    Service 	Status 	Last Check 	Duration 	Attempt 	Status Information
    GANGLIA::Ganglia collector [gmond] Process down alert for hbasemaster CRITICAL	02-15-2013 12:37:18	 0d 20h 30m 58s	4/4	Connection refused
    GANGLIA::Ganglia collector [gmond] Process down alert for jobtracker CRITICAL	02-15-2013 12:37:25	 0d 20h 30m 53s	4/4	Connection refused
    GANGLIA::Ganglia collector [gmond] Process down alert for namenode CRITICAL	02-15-2013 12:37:17	 0d 20h 30m 47s	4/4	Connection refused
    GANGLIA::Ganglia collector [gmond] Process down alert for slaves CRITICAL	02-15-2013 12:37:24	 0d 20h 30m 42s	4/4	Connection refused
    

    On the Nagios host, this seemed to make the errors go away:

    sudo service hdp-gmond start
    

    On the Nagios host, to permanently fix, add to the header of the init script /etc/init.d/hdp-gmond:

    # chkconfig: 345 20 80
    # description: Starts and stops the agent
    

    Then add it to the system startup:

    sudo chkconfig --add hdp-gmond
    sudo chkconfig --levels 345 hdp-gmond on
    
    Collapse
    #13226

    tedr
    Member

    Hi Trang,

    Thanks for letting us know that you figured out the solution.

    Thanks,
    Ted.

    Collapse
    #13213

    Trang Nguyen
    Member

    Just an update…

    I was able to resolve this issue by:
    -uninstalling httpd, nagios and puppet on the nagios box
    -uninstalling hmc and puppet
    -rerunning HMC

    Trang

    Collapse
Viewing 4 replies - 1 through 4 (of 4 total)