The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Windows – Other Forum

Just who is ?

  • #50838
    Toby Evans

    Hi there

    I’ve had a HDP for Windows running for about a year now, building it up into a mini-cluster. We’ve got loads of things running, it’s great

    the upgrade to HDP2 has brought us a new pleasure – every so often, a job will fail as the data node is unable to reach

    this isn’t consistent. The same nodes can run the same jobs for hours, and it’s fine. Then, about 5-10% of the time, the job will fail because it can’t reach This IP address falls under the “link-local address” range –

    It’s not one machine that does it – it will happen to all the nodes trying to run the task when it does. It’s not one particular job, it can be any

    it’s very odd. I’m considering using an alias for the namenode and defining that via coresite.xml and the hosts file, but that would mean manually updating loads of machines, and I’d rather do anything than that, unless I had to and I knew for certain it would work. Any ideas?

    when it goes wrong:

    2014-04-01 11:25:37,242 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    2014-04-01 11:25:37,257 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
    2014-04-01 11:25:37,257 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: fs.defaultFS;  Ignoring.
    2014-04-01 11:25:37,710 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
    2014-04-01 11:25:37,803 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
    2014-04-01 11:25:37,803 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
    2014-04-01 11:25:37,819 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
    2014-04-01 11:25:37,819 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1395281191997_0547, Ident: (
    2014-04-01 11:25:37,928 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
    2014-04-01 11:25:44,901 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
  • Author
  • #50840
    Toby Evans

    here’s the stack trace of the exception:

    2014-04-01 11:27:25,943 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : Call From YEPS72102/ to failed on connection exception: Connection refused: no further information; For more details see:
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
        at java.lang.reflect.Constructor.newInstance(
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(
        at $Proxy6.getTask(Unknown Source)
        at org.apache.hadoop.mapred.YarnChild.main(
    Caused by: Connection refused: no further information
        at Method)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(
        at org.apache.hadoop.ipc.Client$Connection.access$2600(
        at org.apache.hadoop.ipc.Client.getConnection(
        ... 4 more
    Robert Molina

    Hi Toby,
    Have you checked your network to see if there are any issues? Are there any errors in your nics? At the moment, you are not using dns?


    Toby Evans

    do you mean doing everything via static ip address, rather than named machines?


    Hi Toby,

    A 169 address means the computer isn’t connected to a Network. – This is the “link local” block. As described in [RFC3927], it is allocated for communication between hosts on a single link. Hosts obtain these addresses by auto-configuration, such as when a DHCP server cannot be found

    You should check your network configuration, hosts files and ensure each node can talk to each other node via its hostname.
    The machines should also have static IP addresses rather than DHCP (which it looks like you are using if you are getting 169 addresses



    Toby Evans

    Hi Dave,

    that makes sense, and I’m going to get some static IP addresses. But (Columbo style), just one thing … how can I be reading the updating log from a datanode reporting it’s having a network failure? Here’s the exception: Call From YEPS56563/ to failed on connection exception: Connection refused:

    so, the datanode, yeps56563, still has it’s IP address – it’s the target one, the namenode, it seems to have “forgotten”. But only temporarily. And all the datanodes do it at the same time. Then run the exact same job 30 seconds later, and they can all run fine again. All the time, you can access all the logs about this network failure via the namenode/yarn console on 8088

    I’ll get onto my network guys


    Hi Toby,

    It’s probably your DHCP server having an issue – which is why all show this issue at the same time.
    Without really looking into your configuration and network it’s hard to say.

    If you use static IPs then you wont see this issue – also static IP’s are a best practice for hadoop.



The topic ‘Just who is ?’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.