Home Forums HDP Installation Shutdown services in Ambari and now unable to start back up

Tagged: 

This topic contains 10 replies, has 5 voices, and was last updated by  Robert 2 months ago.

  • Creator
    Topic
  • #16715

    Hello,

    Working with Hortonworks professional services, we have had a 20-node HDP 1.2 cluster up and running for a couple of weeks, using Ambari to manage the cluster. This afternoon we stopped services using Ambari to try and make configuration changes. The services appear to have been stopped but we kept receiving errors when attempting to save configuration changes to either MapReduce or HDFS, stating that the services still needed to be stopped.

    Currently, the HDFS and Nagios services are blinking red after an attempt to start them back up. Looking at the ambari server and client logs, I don’t see anything that jumps out as the cause of the issue. Any advice to help with troubleshooting this issue (and getting our cluster back up) would be greatly appreciated. I can provide any files that would be useful.

    Thanks in advance,
    -Bobby

Viewing 10 replies - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #18435

    Robert
    Member

    Hi Bobby,
    For reference, here is the defect:

    https://issues.apache.org/jira/browse/AMBARI-1582

    Regards,
    Robert

    Collapse
    #18430

    Thanks for everyone’s replies. In the end, we worked with Hortonworks support to discover (via some REST calls) that Ambari thought several “components” were in an odd, failed state. With their help in “resetting” the state of these components and configuration versions, things are back to normal.

    Based on what I remember from the conversations, it seems that there is a bug open for this type of behavior. Attempting to save a configuration after a potential “unclean/incomplete” shutdown of services combined with a restart of the Ambari server processes appears to contribute to the triggering of this bug.

    Collapse
    #16782

    Can you confirm/check the following:

    - On the JobTracker host, you can see the process running but Ambari Web is saying MapReduce is not running?
    - If JT is running, can you check the location of the PID? By default, it’s usually /var/run/hadoop/mapred (where mapred is the user account that runs the jobtracker).
    - Did you customize either the PID directory or the user accounts during install? Those settings were on the “Customize Services” section of the wizard?

    - Regarding Nagios, did you install the Nagios server on the same host as your Ganglia server?
    - On that host, if you “service nagios start” does Nagios start and show started in Ambari Web?

    Collapse
    #16777

    tedr
    Member

    Hi Bobby,

    If the dots for a particular component is blinking it means that Ambari thinks that there is currently a background process still out for it. If the dot is red and blinking it thinks it is in the shutdown process, if green then startup. In the upper left of the Ambari management page, next to the cluster name, is there a number in a blue box? If so, this is the number of processes that Ambari thinks it is waiting on and it won’t move on until these are done. The bit with Ganglia is that when Ganglia is installed it gets hooked into the automatic start on boot and the process from this start up need to be killed before Ambari can start it itself. You can kill these with a “killall -9 gmond” and a “killall -9 gmetad”.

    Thanks,
    Ted.

    Collapse
    #16775

    Hi Ted,

    This morning, I have gone ahead and started the various processes up manually. Based on the Nagios that comes with Ambari, everything is up and running as expected and running jps on the nodes confirms this. However, the only service that appears to be okay in Ambari is HDFS. All other services are red with MapReduce and Nagios continuing to blink red.

    I am still working to get the Ganglia server process (gmetad) working correctly (even manually) but I feel that is the least of my concerns. I’m just looking to get Ambari back to a point where the cluster can be managed, if that is at all possible. Worst case, we will ditch Ambari and use scripts (as it seems very fragile currently).

    Thanks,
    -Bobby

    Collapse
    #16764

    tedr
    Member

    Hi Bobby,

    Does jps on the jobtracker and tasktracker nodes show that these processes are still running? Usually Ambari will catch up to what is actually running fairly quickly unless there is a background process running.

    Thanks,
    Ted.

    Collapse
    #16734

    Starting the NameNode, SecondaryNameNode, and DataNode processes manually, I have been able to get the HDFS service to go green in Ambari. I have not had the same luck with the MapReduce service… the JobTracker and TaskTracker processes are running but they still show as being down in Ambari.

    I have tried restarting the Ambari server and the Ambari clients with little success in resolving the issue. My gut feeling is this is an agent/puppet issue but I don’t see information in the Ambari client/server logs that jumps out at me as to the problem and, unfortunately, I don’t how to check the puppet side of things (if it’s even possible).

    Collapse
    #16718

    Please note that currently the only process I am able to start via Ambari is the NameNode which is on the same server as the ambari-server process.

    iptables is not running on the servers making up the cluster

    Collapse
    #16717

    Hi Yi,

    I attempted to make a change to the dfs.umaskmode parameter via the Ambari interface (Services –> HDFS –> Configs –> Advanced) but it wouldn’t let me save, stating that I needed to stop the HDFS and MapReduce services, which appeared to have been stopped. Attempting to bring up the services results in the blinking red dots next to the components in the Services tab and a long delay before the background operations go away.

    Looking at the ambari agent logs on the different servers, the agents don’t appear to be doing anything other than responding to status-type messages.

    Thanks,
    -Bobby

    Collapse
    #16716

    yi zhang
    Member

    Hi Bobby,

    How did you make the changes and what are the changes? Ambari overwrites customized changes if they are not made through Ambari.

    Thanks,
    Yi.

    Collapse
Viewing 10 replies - 1 through 10 (of 10 total)

You are not currently logged in.






» Lost your Password?

Join Our Community

Stay up-to-date on the latest news, download software, watch training videos and more.

Join the Hortonworks Community

About HDP

Hortonworks Data Platform (HDP) is a 100% open source data management platform based on Apache Hadoop. It allows you to load, store, process and manage data in virtually any format and at any scale.

Learn More

Hadoop Training

Developing Solutions with Apache Hadoop Classes

Understanding Hadoop on Windows Classes

Applying Data Science using Apache Hadoop Classes

Developing Apache Hadoop Applications with Java Classes

View All Classes »