Home Forums HDP on Linux – Installation Problem with install – HDFS Start Failed

Tagged: , ,

This topic contains 18 replies, has 3 voices, and was last updated by  Sasha J 2 years ago.

  • Creator
    Topic
  • #10853

    Hi!
    I’ve got problem with Hortonworks Deployment. After installation it couldn’t start HDFS. My instance got 14 cores and 8 GB of memory. It’s running on CentOS 6.2.
    My /etc/hosts file got first line:
    hortonworkstest.hortonworks-test.europewest.internal.mydomain.net hortonworkstest

    And here is deploy log:

    http://codepad.org/mYWvJeD7

    What should I check? How can I finish my installation?
    Kind regards,
    Paweł Żochowski,
    BI Masters

Viewing 18 replies - 1 through 18 (of 18 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #11302

    Sasha J
    Moderator

    this is exactly what needs to be done in case you are not managing your own DNS…

    Collapse
    #11299

    Carlos Garza
    Participant

    It looks like the puppet master was checking the clients PTR records to match hostnames with ip address when each client connects. Long story short if the PTR doesn’t match the ip of the connecting client the puppet master doesn’t trust the client and you’ll see
    Puppet (warning): Denying access: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h1.rackexp.org
    style messages in the hmc logs.
    the 50-57-223-95.static.cloud-ips.com is the PTR for ip 50.57.223.95.

    In short its not enough for your fqdns to resolve to their ips. You must also ensure the reverse look ups on the ips also match your fqdns as puppet uses them to veryfy the fqdn against your ips.

    The only way I’ve found to fix this is to avoid useing DNS if your not in control of your PTR records and add the entries in your /etc/hosts file.

    Collapse
    #11218

    Carlos Garza
    Participant

    If your suggesting that I mangle the hmc installation scripts then I’ll give it a try.

    Collapse
    #11217

    Sasha J
    Moderator

    There is something wrong with Puppet’s SSL certificates…
    make sure you can ping all nodes from that 95 machine by puppet.
    Google for “puppet kick” for the correct syntax.

    Most likely you should reinstall puppet on all nodes…

    Collapse
    #11215

    Carlos Garza
    Participant

    [quote]chkconfig iptables off[/quote]

    They’ve been off. And they still report an empty set of rules.

    This is all happening in the alleged Starting HDFS phase after selecting service all the services. Its kind of strange that starting HDFS is triggering an event to install software.
    [root@h1 ~]# uptime
    00:28:59 up 1 min, 1 user, load average: 0.08, 0.04, 0.01
    [root@h1 ~]# chkconfig –list | grep ip | grep table
    ip6tables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
    iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
    [root@h1 ~]# iptables -L
    Chain INPUT (policy ACCEPT)
    target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination
    [root@h1 ~]#

    iptables was off the whole time.
    The puppet KICK failed.

    Collapse
    #11213

    Carlos Garza
    Participant

    The real issue is the kick fialed.
    [2012:10:17 23:48:50][INFO][PuppetInvoker][PuppetInvoker.php:100][sendKick]: h2.rackexp.org: Kick failed with warning: peer certificate won’t be verified in this SSL session
    warning: peer certificate won’t be verified in this SSL session
    warning: peer certificate won’t be verified in this SSL session
    warning: peer certificate won’t be verified in this SSL session
    Host h1.rackexp.org failed: Error 403 on SERVER: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h1.rackexp.org [save] at line 1
    Host h2.rackexp.org failed: Error 403 on SERVER: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h2.rackexp.org [save] at line 1
    Host h3.rackexp.org failed: Error 403 on SERVER: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h3.rackexp.org [save] at line 1
    Host h4.rackexp.org failed: Error 403 on SERVER: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h4.rackexp.org [save] at line 1
    Triggering h1.rackexp.org
    Triggering h2.rackexp.org
    Triggering h3.rackexp.org
    Triggering h4.rackexp.org
    h1.rackexp.org finished with exit code 2
    h2.rackexp.org finished with exit code 2
    h3.rackexp.org finished with exit code 2
    h4.rackexp.org finished with exit code 2
    Failed: h1.rackexp.org, h2.rackexp.org, h3.rackexp.org, h4.rackexp.org

    [2012:10:17 23:48:50][INFO][PuppetInvoker][PuppetInvoker.php:100][sendKick]: h1.rackexp.org: Kick failed with warning: peer certificate won’t be verified in this SSL session.

    All my nodes are resolvable from DNS You can resolve them too.
    h0.rackexp.org through h4.rackexp.org.

    If a kick has failed then I would think that means the HDFS software probably never made into the worker nodes. the yum log shows ruby puppet was the last thing to be iinstalled.
    warning: peer certificate won’t be verified in this SSL session
    warning: peer certificate won’t be verified in this SSL session
    warning: peer certificate won’t be verified in this SSL session

    Collapse
    #11211

    Sasha J
    Moderator

    chkconfig iptables off

    This will fully disable it and it will not report anything.
    At what point you have failures? During nodes discovery and bootstrapping, or during installation and starting services?
    Log’s tail does not make any sense.

    Collapse
    #11210

    Carlos Garza
    Participant

    root@h0 hmc]# tail -f hmc.log
    [2012:10:17 23:45:00][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:05][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:10][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:15][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:20][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:25][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:30][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:35][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:40][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:45][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:50][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:55][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 1 nodes have reported for txn 3-2-0
    [2012:10:17 23:45:55][WARN][PuppetInvoker][PuppetInvoker.php:344][waitForResults]: Kick timed out, waited 120 seconds
    [2012:10:17 23:45:56][INFO][PuppetInvoker][PuppetInvoker.php:292][genKickWait]: Kick attempt (2/3)
    [2012:10:17 23:45:57][INFO][PuppetInvoker][PuppetInvoker.php:97][sendKick]: h0.rackexp.orgprevious kick still running, will continue to

    I’m running this on a RHEL6 server now and I’m getting the same issue. Also iptables will
    report an empty set of rules even when its disabled.
    [root@h0 /]# service iptables start
    iptables: Applying firewall rules: [ OK ]
    [root@h0 /]# iptables -L
    Chain INPUT (policy ACCEPT)
    target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination
    [root@h0 /]# service iptables stop
    iptables: Flushing firewall rules: [ OK ]
    iptables: Setting chains to policy ACCEPT: filter [ OK ]
    iptables: Unloading modules: [ OK ]
    [root@h0 /]# iptables -L
    Chain INPUT (policy ACCEPT)
    target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination

    Collapse
    #11209

    Sasha J
    Moderator

    and then run installation again.

    Collapse
    #11206

    Sasha J
    Moderator

    It seems like firewall on node x.x.x.95 still running….

    6. Firewall
    Table: filter
    Chain INPUT (policy ACCEPT)
    num target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    num target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    num target prot opt source destination

    Please, disable it for sure and then do:
    on HMC node:
    yum erase hmc puppet
    yum install hmc
    service hmc start

    on all other nodes:
    yum erase puppet

    Collapse
    #11197

    Carlos Garza
    Participant

    I’m getting a simular issue but for me it looks like its puppet related.
    Here is a line from one of the worker nodes failing to do something on the puppet master. CentOS6
    Wed Oct 17 18:48:41 +0000 2012 Puppet (err): Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h1.rackexp.org [save] at line 1
    Wed Oct 17 18:48:52 +0000 2012 access[/run] (info): defaulting to no access for 50-57-223-95.static.cloud-ips.com
    Wed Oct 17 18:48:52 +0000 2012 Puppet (warning): Denying access: Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h1.rackexp.org [save] at line 1
    Wed Oct 17 18:48:52 +0000 2012 Puppet (err): Forbidden request: 50-57-223-95.static.cloud-ips.com(50.57.223.95) access to /run/h1.rackexp.org [save] at line 1
    Wed Oct 17 18:49:03 +0000 2012 access[/run] (info): defaulting to no access for 50-57-223-95.static.cloud-ips.com
    The checkit script created a strangly named file that I’m tarballing since I can’t even spell it.
    You can fetch it via
    wget http://utils.rackexp.org/checkoutput.tar.gz

    The strange this is these puppet messages where comming in during the hdfs start phase. It was as if puppet haden’t caught up yet but the HMC app was alreay trying to start hdfs.
    I have the key correctly installed as all root accounts on all nodes can log into each others root accounts. And I got the host names correct too.
    {
    “hosts”: [
    "h0.rackexp.org",
    "h1.rackexp.org",
    "h2.rackexp.org",
    "h3.rackexp.org",
    "h4.rackexp.org"
    ],
    “user”: “hadoop”,
    “key”: “/home/crc/.ssh/id_rsa”
    }

    crc@bork:~/hadoop$ ./exec.py “root” “hostname -f”
    Executing hostname -f
    Connecting to host h0.rackexp.org
    h0.rackexp.org
    Connecting to host h1.rackexp.org
    h1.rackexp.org
    Connecting to host h2.rackexp.org
    h2.rackexp.org
    Connecting to host h3.rackexp.org
    h3.rackexp.org
    Connecting to host h4.rackexp.org
    h4.rackexp.org

    Collapse
    #11167

    Sasha J
    Moderator

    This could be a good idea.
    Let us take this offline, I will communicate to you through e-mail to set this up…

    Collapse
    #11163

    I’m so sorry, but not working. Maybe I’ll give You SSH access to my machine and You’ll check it by self?

    Collapse
    #10970

    Sasha J
    Moderator

    Pawel,
    what do you mean “can’t change password’?
    you do not need to change anything.

    You can also try the following command:

    sudo check.sh

    it should ask you for the password once, then execute script as root.

    Collapse
    #10962

    I’ve tried it, but I can’t change that password.
    I’m using CentOS on Windows Azure – maybe it should help?

    Collapse
    #10958

    Sasha J
    Moderator

    Pawel,
    you should do:

    sudo su – root

    and give the password.
    This way you will become “root” temporarily and will be able to run the script.

    Collapse
    #10904

    Hi!
    I’ve been trying to run that script, but it asking me for root password. I just only have got user password and I’m switching to root by sudo su – command with that password too.
    Is any possibility to workaround that password prompt?
    Kind regards!

    Collapse
    #10855

    Sasha J
    Moderator

    Pawel,
    Unfortunately, log you upload is incomplete…
    Please, use script from the following post to grab more information :

    http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/

    Thank you!

    Collapse
Viewing 18 replies - 1 through 18 (of 18 total)