Home Forums HDFS hdfs error

This topic contains 12 replies, has 6 voices, and was last updated by  Francois BORIE 1 year, 8 months ago.

  • Creator
    Topic
  • #11978

    Steve Cohen
    Participant

    I am seeing the following error messages every minute:

    2012-11-09 15:00:18,203 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.104.211.66:50010, storageID=DS-2066043655-10.104.211.66-50010-1352310322398, infoPort=50075, ipcPort=8010):DataXceiver
    java.io.EOFException
    at java.io.DataInputStream.readShort(DataInputStream.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:83)
    at java.lang.Thread.run(Thread.java:662)
    2012-11-09 15:01:18,013 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.104.211.66:50010, storageID=DS-2066043655-10.104.211.66-50010-1352310322398, infoPort=50075, ipcPort=8010):DataXceiver
    java.io.EOFException
    at java.io.DataInputStream.readShort(DataInputStream.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:83)
    at java.lang.Thread.run(Thread.java:662)

    I take it this is some montioring that is occuring that is exiting out?

    Thanks,
    Steve Cohen

Viewing 12 replies - 1 through 12 (of 12 total)

The topic ‘hdfs error’ is closed to new replies.

  • Author
    Replies
  • #19307

    Francois BORIE
    Participant

    Hi Seth,

    Ok thanks for that !

    I look forward to hearing from you.

    Regards,

    François

    Collapse
    #19304

    Seth Lyubich
    Keymaster

    Hi Francois,

    This seems to be related to https://issues.apache.org/jira/browse/AMBARI-1488. We were able to reproduce this issue on 1.2.2 cluster that was previously upgraded. However, as Larry pointed out, this error message should be harmless on actual system functionality. We will keep you updated.

    Thanks,
    Seth

    Collapse
    #19298

    Francois BORIE
    Participant

    Hi Larry,

    I think I’m already using the last version of HDP (1.2.2) – I’ve just upgraded this week and restarted all the services :

    [root@****** ~]# rpm -qa | egrep -i ‘ambari-server|ambari-agent|nagios’
    nagios-common-3.4.4-1.el6.x86_64
    nagios-3.2.3-3.el6.rf.x86_64
    ambari-server-1.2.2.3-1.noarch
    hdp_mon_nagios_addons-1.2.2.3-1.el6.noarch
    nagios-plugins-1.4.9-1.x86_64
    ambari-agent-1.2.2.3-1.x86_64

    Many thanks,

    Regards,

    François

    Collapse
    #19290

    Larry Liu
    Moderator

    Hi, Francois

    Which version of HDP are you using? This issue is not a concern. As you stated, it is a issue from nagios check. Since the nagios check will not transfer any data, this caused DataXceiver java.io.EOFException since it expects Short (readShort).

    This issue was fixed in newer version of HDP. Please upgrade.

    Thanks
    Larry

    Collapse
    #19161

    Francois BORIE
    Participant

    Hi,

    All my Hadoop cluster nodes are working fine (all nagios status both for services and for hosts are green), but I see also those logs in my datanode logs.

    I’ve just noticed it is related to how the datanode is checked by Nagios.

    In fact, the “datanode process down” check is the following :

    /usr/lib64/nagios/plugins/check_tcp -H -p 50010 -w1 -c 1

    Every time I launch it from my nagios server, it answers me :

    TCP OK – 0.001 second response time on port 50010 ….

    And in the same time, it generates this bad error log in the datanode I just checked (I think it’s due to the fact the TCP connection is interrupted suddenly)

    2013-03-27 17:40:52,101 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.26.103.106:50010, storageID=DS-734936131-10.2******-50010-1363104702203, infoPort=50075, ipcPort=8010):DataXceiver
    java.io.EOFException
    at java.io.DataInputStream.readShort(DataInputStream.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
    at java.lang.Thread.run(Thread.java:662)

    So, it’s not a real “error” log, but it’s not really nice because it floods datanodes logs.

    Is there any way to avoid this log trace in the datanode logs ? (except stopping Nagios ;-)

    Many thanks,

    François

    Collapse
    #14748

    tedr
    Member

    Hi Kalyan,

    The Nagios error you posted indicates that it could not connect to the datanode. make sure that the datanode service is running and that there is nothing blocking communication on it’s ports.

    Thanks,
    Ted.

    Collapse
    #14736

    Checking for a solution of how to get this error resolved…Any Nagios specific steps…will be glad to know. I do see a alert in Nagios against the node…it reads on the Nagios console as:
    Current Status:
    CRITICAL
    (for 0d 1h 0m 42s)
    Status Information: Connection refused
    Performance Data:
    Current Attempt: 3/3 (HARD state)
    Last Check Time: 02-03-2013 02:11:17
    Check Type: ACTIVE
    Check Latency / Duration: 0.114 / 0.020 seconds
    Next Scheduled Check: 02-03-2013 02:12:17
    Last State Change: 02-03-2013 01:11:17
    Last Notification: 02-03-2013 01:12:26 (notification 1)
    Is This Service Flapping?
    NO
    (0.00% state change)
    In Scheduled Downtime?
    NO
    Last Update: 02-03-2013 02:11:56 ( 0d 0h 0m 3s ago

    more from logs…

    2013-02-03 02:03:51,934 ERROR datanode.DataNode (DataXceiver.java:run(223)) – esx179-shwnode002.anant.saama.com:50010:DataXceiver error processing unknown operation src: /10.20.100.114:48845 dest: /10.20.100.113:50010
    java.io.EOFException
    at java.io.DataInputStream.readShort(DataInputStream.java:298)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:50)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:196)
    at java.lang.Thread.run(Thread.java:662)
    2013-02-03 02:03:53,128 INFO logs (Slf4jLog.java:info(67)) – Aliases are enabled

    Collapse
    #12042

    Steve Cohen
    Participant

    Thanks for looking into this. I figured some monitor was causing the error.

    Collapse
    #12014

    Seth Lyubich
    Keymaster

    Hi Steve,

    The error seems to be triggered by Nagios check ‘DATANODE::Process down’. The reason you don’t see this error on the second host is probably because Nagios checks were not added for datanode that you started from command line. This seems to be a minor issue on functionality side.

    Thanks,
    Seth

    Collapse
    #12010

    Seth Lyubich
    Keymaster

    Hi Steve,

    Please provide full namenode and datanode logs as well as data generated by the script here:

    http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/

    Thanks,
    Seth

    Collapse
    #12008

    Steve Cohen
    Participant

    These are datanode logs. The errors occur all the time, whether there is a job or not. They don’t seem to affect processing. I am using HDP 1.1 and have a two node cluster that I intalled using hmc. I added a datanode/tasktraker/hbase region server on the name node, but the error is on the datanode that was created by hortonworks, not the one I started up from the command line. I need to figure out how to add my second datanode to the hmc.

    Collapse
    #11984

    Seth Lyubich
    Keymaster

    Hi Steve,

    Can you please provide some more information:

    – Are these datanode logs?
    – Are you running any jobs when you see these errors? From the logs it looks like the client is trying to connect every minute and might be failing if it has a different version of Hadoop.
    – Can you please provide configuration information: HDP version, installation method, number of nodes, etc.

    Thanks,
    Seth

    Collapse
Viewing 12 replies - 1 through 12 (of 12 total)