Home Forums HDP on Linux – Installation Error starting HDFS: hadoop can not connect to server

This topic contains 9 replies, has 2 voices, and was last updated by  Sasha J 2 years, 3 months ago.

  • Creator
    Topic
  • #8648

    Rich
    Participant

    During the HMC deployment, the cluster installs but then I get an error at the step where HDFS is supposed to start. Digging through the puppet log files there is a command that fails, so I tried to run the command manually:

    [root@horton ~]# hadoop –config /etc/hadoop/conf dfs -ls /
    12/08/23 21:28:53 INFO ipc.Client: Retrying connect to server: horton.localdomain/10.0.2.15:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

    After 50 retries it terminates. My hostname is “horton.localdomain” which is the output of both hostname and hostname -f commands. The IP address of eth0 is 10.0.2.15 and horton.localdomain resolves to this IP address in /etc/hosts. I can also ssh to root@horton.localdomain w/out a password.

    Any thoughts as to what might be causing this?

    Thanks,
    Rich

Viewing 9 replies - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #8858

    Sasha J
    Moderator

    Hi Rich,
    On my quick test on a single node on 5.8 CENTOS, the cluster install within the HMC fails when I don’t have a host entry with my ip address of the machine. Thus, you will have to at least make changes to the etc/hosts file. Try the following:

    -stop the hmc service (service hmc stop)
    -uninstall puppet which will also uninstall hmc (yum erase puppet)
    -get your current ip address (ifconfig)
    -add that ip address to your etc/hosts file, the example content of my file is (my example ip 10.10.10.157):
    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    10.10.10.157 localhost.localdomain localhost
    127.0.0.1 localhost.localdomain localhost
    ::1 localhost6.localdomain6 localhost6

    -reboot the machine
    -execute a “ping localhost” to verify response and the ip address shown is the one you entered in your /etc/hosts file.
    -you also may want to ping outside to verify you can connect to the internet if you aren’t using a local yum repository to install.
    -Once all the above is verified, proceed installing hmc (yum install hmc)
    -Once hmc is installed, proceed to the HMC page and follow the prompts to execute the minimal install (HDFS, MapReduce, Ganglia, Nagios)

    Let us know if that helps.

    Collapse
    #8826

    Rich
    Participant

    I guess I don’t know how to make my IP address match my DNS address – unfortunately that’s not in my skill set. I just want this to work on localhost. I modified /etc/hosts to look like:
    127.0.0.1 localhost.localdomain localhost
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    The output of “hostname” and “hostname -f” are both localhost.localdomain. I start hmc, stop iptables, then run the HMC installer. The cluster installs, but it will not start. I have uploaded the check.sh output file.

    Can I just install this on localhost and not worry about IP addresses and DNS names?

    Thanks!
    Rich

    Collapse
    #8701

    Sasha J
    Moderator

    Can you make your IP address match to DNS address?

    Sasha

    Collapse
    #8700

    Rich
    Participant

    I guess my problem is I don’t know how to fix the naming and make the local name match to DNS. The IP address of eth0 is 10.0.2.15. The hostname seems correct to me:
    # hostname
    horton.localdomain
    # hostname -f
    horton.localdomain

    In /etc/hosts I have:
    10.0.2.15 horton.localdomain

    Is there something I am doing wrong? Thanks for your help!

    Collapse
    #8682

    Sasha J
    Moderator

    Looks like your naming does not work correctly:

    8. Name resolution
    Server: 192.168.0.1
    Address: 192.168.0.1#53

    Non-authoritative answer:
    Name: horton.localdomain
    Address: 69.16.143.31
    Name: horton.localdomain
    Address: 66.152.109.31

    BUT:

    1. Hosts table
    10.0.2.15 horton.localdomain horton
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    Main error is this:

    [2012:08:24 03:27:36][ERROR][ServiceComponent:NAMENODE][ServiceComponent.php:283][start]: Puppet kick failed, no successful nodes
    [2012:08:24 03:27:37][INFO][ClusterMain:TxnId=3][ClusterMain.php:353][]: Completed action=deploy on cluster=horton_cluster, txn=3-0-0, result=-3, error=Failed to start DATANODE with -3 (\’Failed to start NAMENODE with -3 (\’Puppet kick failed on all nodes\’)\’)

    Most likely, you have to fix naming and make local name (in /etc/hosts) match to DNS.
    As an additional step, I suggest you to remove HMC and puppet then install HMC back and restart installation, just to make sure all the SSL certificates if good (it may be needed to regenerate it because of name resolution change).

    Thank you!
    Sasha

    Collapse
    #8681

    Rich
    Participant

    I’m still getting the same error. I don’t think the NameNode is starting, and my guess is it’s an issue with the servername and/or IP address? Here is the error:

    11. Puppet failures
    Fri Aug 24 08:45:56 -0600 2012 /Stage[3]/Hdp-hadoop::Namenode::Service_check/Hdp
    -hadoop::Exec-hadoop[namenode::service_check]/Hdp::Exec[hadoop --config /etc/had
    oop/conf dfs -ls /]/Exec[hadoop --config /etc/hadoop/conf dfs -ls /]/returns (er
    r): change from notrun to 0 failed: hadoop –config /etc/hadoop/conf dfs -ls / r
    eturned 255 instead of one of [0] at /etc/puppet/agent/modules/hdp/manifests/ini
    t.pp:253
    Fri Aug 24 08:45:56 -0600 2012 /Stage[3]/Hdp-hadoop::Namenode::Service_check/Hdp
    -hadoop::Exec-hadoop[namenode::service_check]/Hdp::Exec[hadoop --config /etc/had
    oop/conf dfs -ls /]/Anchor[hdp::exec::hadoop --config /etc/hadoop/conf dfs -ls /
    ::end] (notice): Dependency Exec[hadoop --config /etc/hadoop/conf dfs -ls /] has
    failures: true
    Fri Aug 24 08:45:56 -0600 2012 /Stage[3]/Hdp-hadoop::Namenode::Service_check/Hdp
    -hadoop::Exec-hadoop[namenode::service_check]/Hdp::Exec[hadoop --config /etc/had
    oop/conf dfs -ls /]/Anchor[hdp::exec::hadoop --config /etc/hadoop/conf dfs -ls /
    ::end] (warning): Skipping because of failed dependencies

    BTW – the check.sh script is awesome! Very helpful in getting right to the error message.

    Collapse
    #8679

    Sasha J
    Moderator

    Exactly.
    Please, go ahead and reinstall cluster, do not forget to change selected default on mount points page (just deselect /dev/mapper and type / in the text field). This is known bug and it will be fixed in next release.
    Also, I believe you run it on relatively small VM (less than 8Gb memory), right?
    IN this case, make sure you change HBase Region Server heap size to at least 1024 (it calculated incorrectly on the low memory machines).

    When you make those changes, installation should go smoothly and complete successfully.

    Thank you!
    Sasha

    Collapse
    #8675

    Rich
    Participant

    Thanks Sasha. I ran the check.sh script and uploaded the resulting file.

    I don’t have any process running on port 8020, so obviously the NameNode didn’t start properly during the install. Your tip about the mount points is important! If you do not change the default values on that step of the HMC installation wizard, then the cluster install will fail because the install scripts will try to run a “mkdir” command in a /dev/mapper folder, which can’t be done.

    Collapse
    #8649

    Sasha J
    Moderator

    This means that NameNode is not started of not bound to your IP address.
    Could you check if port 8020 listed in “netstat -a” output?
    If it listed, make sure it is bound to your IP, no to “localhost”.
    Also, what mount points you defined during the installation? make sure that is is NOT /dev/mapper/xxx.
    Take a look at hmc.log, find which command is failed.
    Use sticky note in http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/
    run mentioned script and send results.

    Thank you!
    Sasha

    Collapse
Viewing 9 replies - 1 through 9 (of 9 total)