Home Forums YARN NodeManager registering to ResourceManager with Port 0

This topic contains 4 replies, has 3 voices, and was last updated by  Nachiketa Shukla 4 months ago.

  • Creator
    Topic
  • #53538

    Subroto Sanyal
    Participant

    Hi,

    I am running HDP-2.1.2.0 on a two node cluster.
    All the Hadoop daemons are running and I can access the WebUI for all of them but, I am unable to submit jobs to cluster. Even a simple PI job is failing with the following exception:

    14/05/13 03:23:21 INFO mapreduce.Job: Job job_1399899195932_0005 failed with state FAILED due to: Application application_1399899195932_0005 failed 2 times due to Error launching appattempt_1399899195932_0005_000002. Got exception: java.net.ConnectException: Call From ip-10-151-121-212/10.151.121.212 to ip-10-73-168-113.ec2.internal:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    	at com.sun.proxy.$Proxy28.startContainers(Unknown Source)
    	at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    	at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
    	at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    	at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.ConnectException: Connection refused
    	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
    	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

    One thing I observed in NodeManager logs and UI, that it is trying to register itself with ResourceManger with Port 0.

    2014-05-13 04:13:38,296 INFO  nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(304)) - Registered with ResourceManager as ip-10-151-121-212.ec2.internal:0 with total resource of <memory:8192, vCores:8>

    Is it problem with environment/configuration or the distribution ??

    Cheers,
    Subroto Sanyal

Viewing 4 replies - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #56118

    Nachiketa Shukla
    Participant

    Seeing exactly the same issue on my 6 node cluster. Using the same version as well.

    Collapse
    #53664

    Thanks Subroto. Independently several of our internal teams also ran into this.

    We are actively working on this and will push out a fix as part of the next maintenance release. Tx for the help.

    Collapse
    #53636

    Subroto Sanyal
    Participant

    Hi Vinod,

    I have tried this only in EC2 cluster.
    I have tried to make the port fixed at that works but, this was not required previously (HDP-2.0.6) even on EC2.
    Now I am using something like 0.0.0.0:45454 which makes ContainerManager happy. :-)

    Thanks for the workaround.

    Cheers,
    Subroto Sanyal

    Collapse
    #53634

    Hi Subroto,

    That’s very weird, we haven’t seen that. Is this only on Ec2? The only possibility from what I can see is that the Jetty server inside the NodeManager failed to bind to an ephemeral port by the time of registration ( a Jetty bug?).

    A clear work-around is to set yarn.nodemanager.address to be say 0.0.0.0:12345, where 12345 is a known port. Can you try that?

    We still need to debug as to what is happening, but the work-around should unblock you. Let me know what you find. Thanks.

    Collapse
Viewing 4 replies - 1 through 4 (of 4 total)