Home Forums HDP on Linux – Installation Add Host Wizard stuck on failed install, cannot add new host

This topic contains 11 replies, has 3 voices, and was last updated by  Ryan 2 weeks, 6 days ago.

  • Creator
    Topic
  • #33728

    M F
    Member

    Earlier today I tried to add a new host to my Ambari-managed cluster. I managed to mess a few things up, and neither the DataNode nor the JobTracker could install properly. Deciding I needed to wipe the node and start over, I did so. I’m working on Amazon EC2, so my old IP address is long gone by now.

    My problem is that I cannot get the Add New Host wizard to get UNSTUCK from the old install. It’s on the Install, Start and Test screen and won’t allow me to click Next, and there’s no back button! I even tried rebooting the entire cluster to get it unstuck, but when it comes back up it puts me back to this screen. Does anybody know how to get past this? Please help, I really don’t want to have to scrap the cluster and install from scratch because of this.

Viewing 11 replies - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #50702

    Ryan
    Participant

    Hi MF, were you able to solve your issue using the suggestions from Dave? I am facing the same problem.
    Thanks,
    Ryan

    Collapse
    #34631

    Dave
    Moderator

    Hi MF,

    Please try these steps:

    1. Create a file called fix.sql on the Ambari server host with the following content:

    delete from ambari.key_value_store where key=’CLUSTER_CURRENT_STATUS’;

    insert into ambari.key_value_store values (‘CLUSTER_CURRENT_STATUS’,'{“clusterState”:”CLUSTER_STARTED_5″}’);

    2. Run the SQL script:

    psql -U ambari-server ambari < fix.sql

    (enter password "bigdata" when prompted)

    When executed a similar output should be displayed:

    DELETE 1

    INSERT 0 1

    3. Restart Ambari server

    ambari-server restart

    4. Log out and then log back into the Ambari web interface. Clicking on the tab with your cluster name should not display the 'Add Hosts' wizard stuck in the 'Install, Start and Test' stage.

    Thanks

    Dave

    Collapse
    #34558

    Dave
    Moderator

    Hi MF,

    Can you log into psql on your machine :

    psql -U ambari ambari (default password bigdata)

    Then run ‘select * from hosts’ – does the stuck host appear in here?

    If so, then can you stop your ambari-server & agents, then delete this row out of hosts, and restart psql (service postgres restart) and start ambari server & agents.

    Let me know how you get on.

    Thanks

    Dave

    Collapse
    #34406

    M F
    Member

    This is the only line from logs I can find amiss:

    ./ambari-server.log.1:13:27:31,805 WARN HeartBeatHandler:396 – Received registration request from host with non compatible agent version, hostname=16.sparticus.shadoop.distil.it, agentVersion=1.2.5.17, serverVersion=1.3.0

    This is in the ambari-server.log, and occurred when the node initially failed out. There were version mismatches. Can’t find anything else up in the logs I’m looking at, but sure enough, the node remains “stuck” in the installation process. If I knew a simple way to decommission the node I bet I could clear it out of the installation screen.

    Collapse
    #34259

    Dave
    Moderator

    Hi MF,

    What’s shows in the ambari-server & agent (for the stuck server) logs ?
    Also, what does the PSQL logs show (/var/lib/pg_sql/data/pg_log/.log [possibly look at the day it got stuck, and the following days]
    Does anything look amiss there, or in the agent / server logs?

    Thanks

    Dave

    Collapse
    #34236

    M F
    Member

    Bumping. Would really appreciate any insights people would have regarding decommissioning nodes or clearing failed installations.

    Collapse
    #33777

    M F
    Member

    That’s correct. Clicking “Add Host” brings me to the page that I screen shotted, even if I reboot the ambari server (not just the ambari-server, but the actual machine). At this point I’ve clobbered that EC2 instance altogether and would like to simply clear that record from Ambari so I can add a different machine. I can’t click back to “Install Options” or even alter the URL to go back to step1. It brings me right back to the screen depicted in the screenshot, and I can’t find any options to decommission the node. Very frustrating stuff!

    The server is still “ok” — the cluster is still running scheduled jobs fine, and I can use the existing machines. I just cannot extend the cluster at this point (as the “add host” is totally stuck on the screen I posted) and that’s going to prove a problem in the coming weeks. If this weren’t already handling live data I’d simply clobber the whole thing and start from scratch, but I’d really like to avoid that as it stands.

    As it stands I have 1 master, 3 slaves. Tried to add the 4th when this happened.

    Collapse
    #33770

    Dave
    Moderator

    Hi MF,

    No problem, so anytime you click on the add host you’re met with “errors encountered”? If you click on the left hand side and choose “Install Options” does it take you back?

    Is your ambari-server still ‘ok’ – how many nodes do you have in your cluster or do you want to start again?

    Thanks

    Dave

    Collapse
    #33739

    M F
    Member

    Dave,

    Sorry if I’m spamming you with replies, but I think I replied to my own post up above. Just wanted to reply to you here as well to make sure you see that I responded up above (there’s no edit or delete functionality on these boards??). Thanks again.

    Collapse
    #33738

    M F
    Member

    Dave,

    Thanks for your swift reply. Unfortunately I don’t have the log from the new host anymore, as I clobbered the server foolishly before I could gather it (at the time I figured it was of no consequence to save the error). The error that caused it to fail is no longer of consequence … I know what caused it now (I used yum to revert its ambari-agent version back to a 1.2.4 variant which the server was using, and it didn’t update the dependencies properly — my automated install script did not specify a version in the yum command and it grabbed a newer version of the ambari-agent than the ambari-server currently running). The problem now is that this server which is forever gone (terminated EC2 instance) is freezing up my Ambari client. Whenever I open up the “add new hosts” wizard, I get the following screen: http://i.imgur.com/Etu4iE5.png

    The above screenshot shows what I see whenever I hit “add new host”, even after rebooting the cluster! It’s simply stuck there, and I’m unable to add new hosts. I don’t even necessarily care to delete this non-working host from the list (which appears unsupported in Ambari), I just want to be able to add another one (as I now have the previous issues which caused the install crash depicted above sorted out). My problem stems from not being able to get “unstuck” from that screen, as next is greyed out and there’s no “back” option (all of the headings on the left are greyed out and cannot be clicked).

    The error I see in my server log which seems relevant is: ./ambari-server.log.1:13:27:31,805 WARN HeartBeatHandler:396 – Received registration request from host with non compatible agent version, hostname=16.sparticus.shadoop.distil.it, agentVersion=1.2.5.17, serverVersion=1.3.0

    As I said I sorted this out by making sure I installed the right agentVersion the next time around (my yum command was flubbed), but degrading it in yum didn’t properly update the dependencies and here I am, with the defunct host stuck in the add host wizard with seemingly no recourse.

    Collapse
    #33734

    Dave
    Moderator

    Hi MF,

    What does the ambari-agent log show for the new host and what does the ambari-server log show?

    Thanks

    Dave

    Collapse
Viewing 11 replies - 1 through 11 (of 11 total)