Hey guys, I’m running a hdp 2.1 cluster on a set of machines running CENTOS 6.5
I successfully installed the cluster ( a few times now), but I am running into consistent issues when restarting the machines that make up ther cluster. I’d like to resolve the issue and get my cluster running again, as well figure out a procedure for avoiding the issue in the future
I made sure to firstly, run stop all from the ambari interface. Then, I shut down all the systems, including the system running ambari-server. Is there anything wrong with this procedure? Should I run ambari-server stop on the ambari-server system prior to shutting that machine down??
Anyhow, I powered on all the systems, and waited to ensure they are all up. Then, I contact the web interface on the ambari-server machine. I log in. I run the start all, and it consistently fails, usually around 11 or 12 seconds.
The odd thing is, the nodes, when you look at the individual services, sometimes a failed service will have the red circle with exclamation point and sometimes it will have the yellow bar. It’s not really clear what either of those mean, since there is no key, but I assume the red is a complete failure.
When I click on any of the failed services, I get messages indicating that the service failed to install, although it was previously installed and I never had wanted to install it again. I only wanted to start my cluster.
I would attach a screenshot of one of the task lists for a node, but I don’t see any option do so.
I can paste the stdout and stderr for one of the services on a particular node which is red if that will be helpful?
What does the yellow line mean as comapred to the red circle??