Home Forums HDP on Linux – Installation Shutdown services in Ambari and now unable to start back up

Tagged: 

This topic contains 12 replies, has 7 voices, and was last updated by  Dave 11 months, 2 weeks ago.

  • Creator
    Topic
  • #16715

    Hello,

    Working with Hortonworks professional services, we have had a 20-node HDP 1.2 cluster up and running for a couple of weeks, using Ambari to manage the cluster. This afternoon we stopped services using Ambari to try and make configuration changes. The services appear to have been stopped but we kept receiving errors when attempting to save configuration changes to either MapReduce or HDFS, stating that the services still needed to be stopped.

    Currently, the HDFS and Nagios services are blinking red after an attempt to start them back up. Looking at the ambari server and client logs, I don’t see anything that jumps out as the cause of the issue. Any advice to help with troubleshooting this issue (and getting our cluster back up) would be greatly appreciated. I can provide any files that would be useful.

    Thanks in advance,
    -Bobby

Viewing 12 replies - 1 through 12 (of 12 total)

The topic ‘Shutdown services in Ambari and now unable to start back up’ is closed to new replies.

  • Author
    Replies
  • #43171

    Dave
    Moderator

    Hi Sushant,

    This was caused by a defect in ambari in HDP 1.2
    Can you start a new thread detailing the version of HDP you are using and the exact issue you are facing.

    Thanks

    Dave

    Collapse
    #43169

    SUSHANT Kadadi
    Participant

    How was this issue exactly resolved. I am facing the same issue.

    Collapse
    #18435

    Robert
    Participant

    Hi Bobby,
    For reference, here is the defect:

    https://issues.apache.org/jira/browse/AMBARI-1582

    Regards,
    Robert

    Collapse
    #18430

    Thanks for everyone’s replies. In the end, we worked with Hortonworks support to discover (via some REST calls) that Ambari thought several “components” were in an odd, failed state. With their help in “resetting” the state of these components and configuration versions, things are back to normal.

    Based on what I remember from the conversations, it seems that there is a bug open for this type of behavior. Attempting to save a configuration after a potential “unclean/incomplete” shutdown of services combined with a restart of the Ambari server processes appears to contribute to the triggering of this bug.

    Collapse
    #16782

    Jeff Sposetti
    Moderator

    Can you confirm/check the following:

    – On the JobTracker host, you can see the process running but Ambari Web is saying MapReduce is not running?
    – If JT is running, can you check the location of the PID? By default, it’s usually /var/run/hadoop/mapred (where mapred is the user account that runs the jobtracker).
    – Did you customize either the PID directory or the user accounts during install? Those settings were on the “Customize Services” section of the wizard?

    – Regarding Nagios, did you install the Nagios server on the same host as your Ganglia server?
    – On that host, if you “service nagios start” does Nagios start and show started in Ambari Web?

    Collapse
    #16777

    tedr
    Member

    Hi Bobby,

    If the dots for a particular component is blinking it means that Ambari thinks that there is currently a background process still out for it. If the dot is red and blinking it thinks it is in the shutdown process, if green then startup. In the upper left of the Ambari management page, next to the cluster name, is there a number in a blue box? If so, this is the number of processes that Ambari thinks it is waiting on and it won’t move on until these are done. The bit with Ganglia is that when Ganglia is installed it gets hooked into the automatic start on boot and the process from this start up need to be killed before Ambari can start it itself. You can kill these with a “killall -9 gmond” and a “killall -9 gmetad”.

    Thanks,
    Ted.

    Collapse
    #16775

    Hi Ted,

    This morning, I have gone ahead and started the various processes up manually. Based on the Nagios that comes with Ambari, everything is up and running as expected and running jps on the nodes confirms this. However, the only service that appears to be okay in Ambari is HDFS. All other services are red with MapReduce and Nagios continuing to blink red.

    I am still working to get the Ganglia server process (gmetad) working correctly (even manually) but I feel that is the least of my concerns. I’m just looking to get Ambari back to a point where the cluster can be managed, if that is at all possible. Worst case, we will ditch Ambari and use scripts (as it seems very fragile currently).

    Thanks,
    -Bobby

    Collapse
    #16764

    tedr
    Member

    Hi Bobby,

    Does jps on the jobtracker and tasktracker nodes show that these processes are still running? Usually Ambari will catch up to what is actually running fairly quickly unless there is a background process running.

    Thanks,
    Ted.

    Collapse
    #16734

    Starting the NameNode, SecondaryNameNode, and DataNode processes manually, I have been able to get the HDFS service to go green in Ambari. I have not had the same luck with the MapReduce service… the JobTracker and TaskTracker processes are running but they still show as being down in Ambari.

    I have tried restarting the Ambari server and the Ambari clients with little success in resolving the issue. My gut feeling is this is an agent/puppet issue but I don’t see information in the Ambari client/server logs that jumps out at me as to the problem and, unfortunately, I don’t how to check the puppet side of things (if it’s even possible).

    Collapse
    #16718

    Please note that currently the only process I am able to start via Ambari is the NameNode which is on the same server as the ambari-server process.

    iptables is not running on the servers making up the cluster

    Collapse
    #16717

    Hi Yi,

    I attempted to make a change to the dfs.umaskmode parameter via the Ambari interface (Services –> HDFS –> Configs –> Advanced) but it wouldn’t let me save, stating that I needed to stop the HDFS and MapReduce services, which appeared to have been stopped. Attempting to bring up the services results in the blinking red dots next to the components in the Services tab and a long delay before the background operations go away.

    Looking at the ambari agent logs on the different servers, the agents don’t appear to be doing anything other than responding to status-type messages.

    Thanks,
    -Bobby

    Collapse
    #16716

    Yi Zhang
    Moderator

    Hi Bobby,

    How did you make the changes and what are the changes? Ambari overwrites customized changes if they are not made through Ambari.

    Thanks,
    Yi.

    Collapse
Viewing 12 replies - 1 through 12 (of 12 total)