Little background: I have been using HDP 1.2 for a long time and haven’t faced this issue so far.
Yesterday, I have installed a new HDP 1.3.2 cluster through Ambari (Amazon EC2). After installation, all services and smoke tests were running successfully. Then I stopped all the services by clicking on the new button “Stop All” and then I stopped all ambari agents and finally server.
This morning, I started my instances(used for HDP 1.3.2) again and started ambari server. I tried to start services one by one this time(I did try with “Start All” button but faced same issue). I tried to start HDFS service at the beginning. It’s failing.
Under HDFS, all data nodes started successfully. But when it tries to start client, namenode and seoncdary name node, its getting halted at the middle.
10.0.0.149 (client machine) – It stops after 50%
10.0.0.75 (name node) – It stops after 35%
10.0.0.76 (job tracker, secondary name node) – It stops at 0%
I have been using VPC in Amazon EC2 so their private IPs remain unchanged after restart/reboot. I have exactly same configuration setup with HDP 1.2 and I didn’t face this issue earlier.
After waiting several minutes, the Namenode start task shows the following error (stderr):
Puppet has been killed due to timeout
Any information will be highly appreciated. Thanks in advance.