Home Forums HDP on Windows – Installation Install OK, but trouble validating

This topic contains 7 replies, has 3 voices, and was last updated by  John Bunch 7 months, 3 weeks ago.

  • Creator
    Topic
  • #26848

    John Bunch
    Member

    I need some help troubleshooting this. I can start all services manually on all nodes, but when I run start_remote_hdp_services.cmd, I get the following error:


    D:\hdp\hadoop>start_remote_hdp_services.cmd
    Master nodes: start hadoop1.sludgebucket.com hadoop2.sludgebucket.com
    0 Master nodes successfully started.
    2 Master nodes failed to start.

    PSComputerName Service Message Status
    -------------- ------- ------- ------
    Connecting to re...
    Connecting to re...

    StartStop-HDPservices : Manually start services on Master nodes then retry
    full cluster start. Exiting.
    At D:\hdp\hadoop\manage_remote_hdp_services.ps1:187 char:26
    + if ($mode -eq "start") { StartStop-HDPservices($mode) }
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorExcep
    tion
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorExceptio
    n,StartStop-HDPServices

    I also find this entry in the hadoop-datanode-HADOOP5.log file:


    2013-06-03 09:41:28,818 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to hadoop1.FloridaKeys.sajes.com/172.16.1.181:8020 failed on local exception: java.io.IOException: An existing connection was forcibly closed by the remote host
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
    at org.apache.hadoop.ipc.Client.call(Client.java:1075)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:909)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1463)
    at java.lang.Thread.run(Unknown Source)
    Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    (leaving out the rest)

    Here is clusterproperties.txt:


    #Log directory
    HDP_LOG_DIR=d:\hadoop\logs

    #Data directory
    HDP_DATA_DIR=d:\hdp\data

    #Hosts
    NAMENODE_HOST=hadoop1.sludgebucket.com
    SECONDARY_NAMENODE_HOST=hadoop2.sludgebucket.sajes.com
    JOBTRACKER_HOST=hadoop1.sludgebucket.com
    HIVE_SERVER_HOST=hadoop2.sludgebucket.com
    OOZIE_SERVER_HOST=hadoop2.sludgebucket.com
    TEMPLETON_HOST=hadoop2.sludgebucket.com
    SLAVE_HOSTS=hadoop3.sludgebucket.com, hadoop4.sludgebucket.com, hadoop5.sludgebucket.com

    #Database host
    DB_FLAVOR=derby
    DB_HOSTNAME=hadoop2.sludgebucket.com

    #Hive properties
    HIVE_DB_NAME=hivedb
    HIVE_DB_USERNAME=hive_user
    HIVE_DB_PASSWORD=Pa$$w0rd

    #Oozie properties
    OOZIE_DB_NAME=ooziedb
    OOZIE_DB_USERNAME=oozie_user
    OOZIE_DB_PASSWORD=Pa$$w0rd

    I’ve double-checked firewall config and eliminated that as a cause.

Viewing 7 replies - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #33858

    John Bunch
    Member

    Update on this issue:

    The above appears to be due to a Powershell security issue when installing HDP on standalone (non-domain member) Windows servers. There may be a step or two missing in the installation instructions in section 5.5. I added each machine to a domain, set Group Policy for the domain (as directed by section 5.6 and the above issue disappeared. Also, I installed version 1.3 and replicated the behavior.

    Collapse
    #33602

    Seth Lyubich
    Keymaster

    Hi John,

    I saw similar issue in HDP 1.1. Can you please try to start service with using start_local_hdp_service.cmd script instead?

    I also wanted to note that HDP 1.3 is out which you can try now.

    http://hortonworks.com/products/hdp-windows/

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #33397

    John Bunch
    Member

    Does not appear to be a firewall issue – if I stop Windows Firewall on all nodes I get the same error.

    I’m not sure how to check for an SSL certificate issue. I have not installed an SSL certificate or made any modifications to Apache – nothing is changed from the installation.

    Collapse
    #27395

    tedr
    Moderator

    Hi BalckMamba,

    In a search over the web for the error in the log it points to either a firewall issue or a ssl certificate issue. Check that these are not the cause.

    Thanks,
    Ted.

    Collapse
    #27284

    John Bunch
    Member

    Ted,

    It isn’t. That’s a mistake in the above text. I was trying to obfuscate the actual domain name but missed a line. Please don’t tell anyone!

    Any idea on what’s causing the error?

    Collapse
    #27269

    tedr
    Moderator

    Hi John,

    A quick question here, I don’t think that it is the cause of your issue, but why is the domain for the secondary name node different from all the rest?

    Thanks,
    Ted.

    Collapse
    #26850

    John Bunch
    Member

    Windows Server 2012 on all nodes, BTW.

    Collapse
Viewing 7 replies - 1 through 7 (of 7 total)