YARN Forum

NodeManager registering to ResourceManager with Port 0

  • #53538
    Subroto Sanyal
    Participant

    Hi,

    I am running HDP-2.1.2.0 on a two node cluster.
    All the Hadoop daemons are running and I can access the WebUI for all of them but, I am unable to submit jobs to cluster. Even a simple PI job is failing with the following exception:

    14/05/13 03:23:21 INFO mapreduce.Job: Job job_1399899195932_0005 failed with state FAILED due to: Application application_1399899195932_0005 failed 2 times due to Error launching appattempt_1399899195932_0005_000002. Got exception: java.net.ConnectException: Call From ip-10-151-121-212/10.151.121.212 to ip-10-73-168-113.ec2.internal:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
    	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
    	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    	at com.sun.proxy.$Proxy28.startContainers(Unknown Source)
    	at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    	at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
    	at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    	at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.ConnectException: Connection refused
    	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
    	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

    One thing I observed in NodeManager logs and UI, that it is trying to register itself with ResourceManger with Port 0.

    2014-05-13 04:13:38,296 INFO  nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(304)) - Registered with ResourceManager as ip-10-151-121-212.ec2.internal:0 with total resource of <memory:8192, vCores:8>

    Is it problem with environment/configuration or the distribution ??

    Cheers,
    Subroto Sanyal

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #53634

    Hi Subroto,

    That’s very weird, we haven’t seen that. Is this only on Ec2? The only possibility from what I can see is that the Jetty server inside the NodeManager failed to bind to an ephemeral port by the time of registration ( a Jetty bug?).

    A clear work-around is to set yarn.nodemanager.address to be say 0.0.0.0:12345, where 12345 is a known port. Can you try that?

    We still need to debug as to what is happening, but the work-around should unblock you. Let me know what you find. Thanks.

    #53636
    Subroto Sanyal
    Participant

    Hi Vinod,

    I have tried this only in EC2 cluster.
    I have tried to make the port fixed at that works but, this was not required previously (HDP-2.0.6) even on EC2.
    Now I am using something like 0.0.0.0:45454 which makes ContainerManager happy. :-)

    Thanks for the workaround.

    Cheers,
    Subroto Sanyal

    #53664

    Thanks Subroto. Independently several of our internal teams also ran into this.

    We are actively working on this and will push out a fix as part of the next maintenance release. Tx for the help.

    #56118
    Nachiketa Shukla
    Participant

    Seeing exactly the same issue on my 6 node cluster. Using the same version as well.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.