Home Forums HDP on Linux – Installation Lack of cluster metrics following fresh HDP install on 4 nodes

This topic contains 22 replies, has 9 voices, and was last updated by  Ray Roberts 1 year, 2 months ago.

  • Creator
    Topic
  • #19299

    Hi,

    I recently installed HDP (1.2.2) (for the first time) on a cluster of 4 nodes. All nodes had freshly installed CentOS 6.3. Installation was carried out using Ambari server.

    In short the problem is that I obtain no cluster or service metrics in Ambari Web UI post installation. Anywhere Ambari ought to render metrics the only output is error message: "No Data There was no data available. Possible reasons include inaccessible Ganglia service."

    Scanning Hortonworks’ resources I tried some remedies listed in sections underhttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_Monitoring_Hadoop_Book/content/monitor-chap3-6-11.html but it did not give more metrics.

    Anyone with a hint up their sleeves? :)

    Some background on the installation. All that was config changes carried out during installation was to reduce the DFS replication factor from 3 to 2, and add passwords where requested in the installation process. All HDP services were installed.

    I have master nodes and two slave nodes. Master 1 runs the following services: ambari server, secondary name node, nagios server, ganglia collector and zookeeper. Master 2 runs name node, job tracker, hive server 2, hive metastore, web hcat server, hbase master, oozie server and zookeeper.

Viewing 22 replies - 1 through 22 (of 22 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #36485

    Ray Roberts
    Participant

    Hi,

    So, I’m not sure if I’m reading this correctly, but is this thread stating that Ganglia doesn’t support IPV6? The Ganglia site claims to support it.

    I’m running HDP 1.3.2 on CentOS 6.4 and I’m receiving the “There was no data available. Possible reasons include inaccessible Ganglia service.”

    I have IPV4 enabled as well and I’ve tried Javier’s solution with no success, so, what else could cause this?

    Thanks,

    Ray

    Collapse
    #25967

    Jeff Sposetti
    Moderator

    Hi All,

    Looks like you may have encountered this issue. Sounds like you found the workaround but in general, you may want to watch this JIRA to see when a formal fix gets committed in the Ambari project.

    https://issues.apache.org/jira/browse/AMBARI-2057

    Cheers,

    Jeff

    Collapse
    #25965

    Hi,
    This weekend I have been trying Ambari 1.2.3. I got the same problem with Ganglia when I restart the cluster services for maintenance. The problem for me was that there were two Ganglia services, the default provided by CentOS (gmetad) and the other provided by Ambari (hdp-gmetad, hdp-gmond). Login to the server that provides Ganglia service and:
    1.- Stop all Ganglia related services:
    /etc/init.d/gmetad stop
    /etc/init.d/hdp-gmetad stop
    /etc/init.d/hdp-gmond stop

    2.- Disable auto start of Ganglia services:
    chkconfig gmetad off

    For me, only gmetad was included. If you also have gmond, repeat previous command with gmond too. Verify all Ganglia related servies are correctly disabled checking the output of ‘chkconfig’ command.

    3.- From Ambari admin web console, start Ganglia (server and monitor) using included actions buttons related with host that provides Ganglia services.

    Hope that helps.
    Regards,
    Javier

    Collapse
    #20257

    Seth Lyubich
    Keymaster

    Hi Jens,

    Happy to hear that issue is resolved. Currently Ipv4 is supported. You can find more info here – http://wiki.apache.org/hadoop/HadoopIPv6. Can you please let us know if other service utilized IPv6, or just Ganglia?

    Thanks for following up and using HDP.

    Seth

    Collapse
    #20210

    tedr
    Member

    Hi Jens,

    When you look at the “Services” tab in Ambari does it have a red or a green dot? If it is red and you have tried to start it up that means for some reason it failed to start. The usual suspect for Ganglia failing to start is that it was started by the system rather than Ambari and thus Ambari cannot get any data from it. The way to correct for this is to kill all of the gmond and gmetad processes currently running and then start Ganglia from inside Ambari.

    I hope this helps,
    Ted.

    Collapse
    #20209

    Hi Seth,

    Disabling IPv6 was the solution to my initial HDP installation misery. :) Now the metrics are nicely rendered in Ambari.

    One final question lingers though. Does HDP and/or Ambari support IPv6? My search for the truth here did not provide conclusive answer.

    Thanks Seth, Ted, Mike, Robert and Larry for your great input.

    Collapse
    #19970

    Seth Lyubich
    Keymaster

    Hi Jens,

    You should use IPv4. Please check network setup and name resolution on all nodes in the cluster. All nodes should resolve correctly and use IPv4 protocol.

    Hope this helps.

    Thanks,
    Seth

    Collapse
    #19968

    Hi Seth,

    This is a lengthy process indeed. Thank you all for your persistence. :)

    Now, first some confirmations. rrdcached is indeed running. The output from starting service hdp-gmetad is analogous to your description. Also sockets are listening on expected ports according to grep -A4 8660 /etc/ganglia/hdp/gmetad.conf. Finally I do have identical rrd tool packages installed.

    Now, I do question my output from netstat -anp| grep '8660\|8661\|8662\|8663'. First, it appears that gmond instances are bound to the IPv6 address of the node. Is this ok, or should I disable IPv6?

    Second, the port is indeed not in listening state. If there is no support for IPv6 I can indeed understand this. If, however, there is support for IPv6 do you see any other more or less obvious reason for why this could be so?

    Collapse
    #19955

    Seth Lyubich
    Keymaster

    Hi Jens,

    I think the method of time synchronization is not as important as actual time synchronization across all nodes in the cluster. One method should be sufficient.

    Also, to answer your previous question directory /var/lib/ganglia/rrds should contain directories with rrd files.

    If you don’t have data there you might have issue with rrd tool. You can try the following:

    Make sure rrdcached process is running. Usually rrd tool gets started with hdp-gmetad service:

    #service hdp-gmetad start

    Starting hdp-gmetad…
    =============================
    /usr/bin/rrdcached already running with PID 24053
    /usr/sbin/gmetad already running with PID 24083

    If you have hdp-gmetad and hdp-gmond running make sure that corresponding ports are listening:

    netstat -anp| grep ‘8660\|8661\|8662\|8663′

    Finally, check that sockets are listening on expected configured ports (not localhost):

    grep -A4 8660 /etc/ganglia/hdp/gmetad.conf

    you should see something like below. Make sure that socket does not point to localhost.

    [root@ambari1 hdp]# grep -A4 8660 /etc/ganglia/hdp/gmetad.conf
    data_source “HDPSlaves” ambari1:8660
    data_source “HDPNameNode” ambari1:8661
    data_source “HDPJobTracker” ambari1:8662
    data_source “HDPHBaseMaster” ambari1:8663

    One more thing you can check is rrd tool packages. This is what I have on my machine:

    [root@ambari1 hdp]# rpm -qa|grep rrd
    rrdtool-1.4.5-1.el6.x86_64
    perl-rrdtool-1.4.5-1.el6.x86_64
    python-rrdtool-1.4.5-1.el6.x86_64

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #19885

    Hi Ted,

    The Ganglia service is indeed running according to the Ambari services tab.

    Also I find no residual non-HDP gmetad or gmond process running on any of my nodes after I have stopped the Ambari Ganglia service. I thereafter restart Ganglia from Ambari services tab and get stable green status. I use check scripts in directory /usr/libexec/hdp/ganglia to test the state of HDP gmond and rrdcached processes after launching from Ambari service tab. Looks ok. There is, however, no test script for HDP gmetad process in the same directory.

    As suggested by the Ambari front-end I tried to telnet into localhost on my Ganglia collector node in order to see the Ganglia XML tree: telnet 127.0.0.1 8652. Trying this the connection was accepted but there seems to be no output reaching this port.

    Final point. On our cloud hosted nodes the only piece of DNS config changed manually after automated installation of the OS was to add all 4 HDP nodes to /etc/hosts. The update was as IP_ADDRESS HOST_NAME.DOMAIN_NAME HOST_NAME. This ought to be correct interpretation of http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.2/bk_using_Ambari_book/content/ambari-chap1-5-4.html, right? Is it possible that DNS misconfiguration could lead to the problems I have getting Ganglia metrics into Ambari?

    Collapse
    #19775

    tedr
    Member

    Hi Jens,

    When you cluster is up and running, what is the status of the dot by Ganglia on the Ambari services tab? if it is green and stays that way, then you should be getting some Ganglia information. If it is red then that means that the Ganglia process that Ambari tried to start did not start up. The usual culprit on Ambari not being able to start Ganglia is the system starting it’s own Ganglia process outside of Ambari. To get Ganglia info to show in the Ambari web page you need to stop these processes started by the system. The system started processes are those that are listed as “gmond’ and ‘gmatad’ – without the ‘hdp’ prepended. If these non-hdp processes are there, stop them and then try to restart Ganglia in Ambari.

    Thanks,
    Ted.

    Collapse
    #19768

    Hi Seth

    After a couple of days offline I am back at the search for solution. :) Regarding synchronization I installed ntp on all nodes, so I believe I took care of this. Also iptables (as well as ip6tables) and selinux are off.

    Actually, come to think of it the nodes probably already had ntpdate installed before the installation of ntp. As these to my knowledge are mutually exclusive, could that possibly explain something? I will however uninstall the latter to see if there is any effect.

    Also, where does Ambari pick of the Ganglia data? Does it read rrd files from /var/lib/ganglia/rrds? I find that there is no output to this directory, thus I ask myself whether problem could be related also to the rrdtool caching daemon. Or doesn’t Ambari pick up metric data from rrd output?

    Collapse
    #19440

    Seth Lyubich
    Keymaster

    Hi Jens,

    One more thing you can check is to make sure that time is synchronized in your cluster and to make sure that selinux and iptables are off.

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #19439

    mike becker
    Member

    it has this message on the Services dashboards.
    When you select services and then HDFS in the lower section of the report entitled Metrics it is empty.

    I have looked for these types of reports in the ganglia section of /etc/ganglia/hdp but none of those type reports are included in the configuration.

    Collapse
    #19437

    Hi Robert,
    A bit of progress here. I.e. by inspiration from the posting you suggested I find definite signs of data transfer at the gmond server: tcpdump -p -s 0 -w - udp port 8660. I repeat the same exercise at all ports in the range 8660-8663 as suggested by the HDP ganglia configuration.

    However, I am not able to telnet into any of these ports from other nodes. Thus I’ll have to look into non-HDP configuration on my nodes for further leads. Will let you know the outcome.

    Collapse
    #19434

    Robert
    Participant

    Hi Jens,
    I just finished retesting on a single node cluster, fresh HDP 1.2.2 on CentOS 6.3 and everything installed fine and I was able to see the ganglia graphs. Thus, the HDP 1.2.2 package seems to be fine and contains default defined cluster metrics to be displayed. I believe these files are located /etc/ganglia/hdp/ which are specifically for monitoring specific hadoop services. Now as far as the error you are getting “No matching metrics detected” have you tried a different browser or going into the machine serving gmetad and verify if you get the same message? Also I found this posting, that talks about the error and the user was asked to use some tool to verify gmond servers were sending out metrics.

    http://comments.gmane.org/gmane.comp.monitoring.ganglia.general/2737

    Regards,
    Robert

    Collapse
    #19347

    Robert
    Participant

    Hi Jens,
    You are right, I forgot about the hdp versions of those services. Interesting. Let me run a quick test since my I believe I upgraded my Ambari from HDP 1.2.1.

    Regards,
    Robert

    Collapse
    #19340

    Hi Robert,
    Both services are running. I.e. gmond is running 4 concurrent instances.

    As a side note the services were started as “hdp-gmond” and “hdp-gmetad”. A new restart of these services by “service restart” also didn’t change much.

    Collapse
    #19339

    Robert
    Participant

    Hi Jens,
    By default, HDP comes with default cluster metric definitions since those are what normally is displayed in the ambari user interface. Can you go to the node which is running the gmond and gmetad service and run the commands

    service gmond status
    service gmetad status

    If one of them is off run the start option
    service gmond start
    service gmetad start

    Hope that helps.
    Regards,
    Robert

    Collapse
    #19308

    The natural question here is whether HDP comes with default cluster metrics definitions or not. :)

    Collapse
    #19303

    Hi Larry,

    Thanks a lot for your quick reply.

    All I see from the Web UI is static empty graphs with accompanying “No matching metrics detected” message. Restart of ganglia from the Ambari UI was tested but changed nothing in terms of metrics output.

    Any other suggestion?

    Collapse
    #19301

    Larry Liu
    Moderator

    Hi, Jens

    When you go into http://$gangliahostname/ganglia/, do you see ganglia metrics?

    Can you please also try to restart ganglia from ambari UI?

    Larry

    Collapse
Viewing 22 replies - 1 through 22 (of 22 total)