Home Forums HDFS Ambari falsely reporting corrupt blocks

This topic contains 5 replies, has 3 voices, and was last updated by  Seth Lyubich 1 year, 6 months ago.

  • Creator
    Topic
  • #24072

    Hi,

    I recently added a datanode to a three (now four) node cluster. Shortly after installation there was a hardware problem with the new node, which may have corrupted some blocks. Since then, Ambari reports 36 corrupt blocks. This was verified by viewing http://namenode:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemMetrics.

    Thinking that there was corruption, I ran ‘hadoop fsck /’. To my surprise it reported that the filesystem is healthy and that there are zero corrupt nodes.

    Which metric is the one of record? If it’s fsck as I suspect, how can I reset what Ambari sees?

    Thank you,
    John

Viewing 5 replies - 1 through 5 (of 5 total)

The topic ‘Ambari falsely reporting corrupt blocks’ is closed to new replies.

  • Author
    Replies
  • #24099

    Seth Lyubich
    Keymaster

    Hi John,

    I noticed that you are running very low heap size on your namenode. Please consider increasing it and check log for any out of memory issues. Also, you can try hadoop fsck -blocks command to check for any corrupted blocks. If you find any issues you can find corrupted files with hadoop fsck -blocks -files command and delete them, or use hadoop fsck -delete command.

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #24098

    Larry Liu
    Moderator

    Hi, John

    I don’t see corrupt blocks. But there are some missing replicas. In order to fix this missing replicas, you can try to increase the replication factor to 3 (Default replication factor: 2) and after there is no under replicated blocks, and reduce the replication factor back to 2.

    Also I noticed the 3 datanodes are not balanced. Please run hadoop balancer which might help fix the missing replica issue as well.

    Hope this helps.

    Larry

    Collapse
    #24082

    Sorry, I ran out of space in the last post.

    Here’s the output of the web UI:

    NameNode ‘namenode:8020′

    Started: Thu May 02 02:38:52 EDT 2013
    Version: 1.1.2.21, r
    Compiled: Thu Jan 10 03:38:39 PST 2013 by jenkins
    Upgrades: There are no upgrades in progress.

    Browse the filesystem
    Namenode Logs
    Cluster Summary

    1952 files and directories, 2125 blocks = 4077 total. Heap Size is 704 MB / 704 MB (100%)
    Configured Capacity : 506.3 GB
    DFS Used : 303.46 GB
    Non DFS Used : 26.36 GB
    DFS Remaining : 176.47 GB
    DFS Used% : 59.94 %
    DFS Remaining% : 34.86 %
    Live Nodes : 3
    Dead Nodes : 0
    Decommissioning Nodes : 0
    Number of Under-Replicated Blocks : 104

    NameNode Storage:

    Storage Directory Type State
    /opt/hadoop/hdfs/namenode IMAGE_AND_EDITS Active

    Thanks for taking a look at this.

    Collapse
    #24080

    Hi Larry,

    After looking through these, I see that it looks like only single replicas on one node are corrupt. How would I fix this? It hasn’t been repaired automatically.

    Note that the new node has more storage than the existing nodes. Here are the outputs:

    hadoop fsck /

    Total size: 122737049452 B
    Total dirs: 422
    Total files: 1533 (Files currently being written: 3)
    Total blocks (validated): 2127 (avg. block size 57704301 B)
    Minimally replicated blocks: 2127 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 104 (4.889516 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 2
    Average block replication: 2.639398
    Corrupt blocks: 0
    Missing replicas: 512 (9.120057 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu May 02 13:52:35 EDT 2013 in 318 milliseconds

    hadoop dfsadmin -report

    Configured Capacity: 543633887229 (506.3 GB)
    Present Capacity: 515326439424 (479.94 GB)
    DFS Remaining: 189485461504 (176.47 GB)
    DFS Used: 325840977920 (303.46 GB)
    DFS Used%: 63.23%
    Under replicated blocks: 104
    Blocks with corrupt replicas: 36
    Missing blocks: 0

    ————————————————-
    Datanodes available: 3 (3 total, 0 dead)

    Name: 10.0.0.1:50010
    Decommission Status : Normal
    Configured Capacity: 144473710591 (134.55 GB)
    DFS Used: 123704954880 (115.21 GB)
    Non DFS Used: 7576473599 (7.06 GB)
    DFS Remaining: 13192282112(12.29 GB)
    DFS Used%: 85.62%
    DFS Remaining%: 9.13%
    Last contact: Thu May 02 13:53:50 EDT 2013

    Name: 10.0.0.2:50010
    Decommission Status : Normal
    Configured Capacity: 254686466047 (237.2 GB)
    DFS Used: 78428827648 (73.04 GB)
    Non DFS Used: 13155704831 (12.25 GB)
    DFS Remaining: 163101933568(151.9 GB)
    DFS Used%: 30.79%
    DFS Remaining%: 64.04%
    Last contact: Thu May 02 13:53:48 EDT 2013

    Name: 10.0.0.3:50010
    Decommission Status : Normal
    Configured Capacity: 144473710591 (134.55 GB)
    DFS Used: 123707195392 (115.21 GB)
    Non DFS Used: 7575269375 (7.06 GB)
    DFS Remaining: 13191245824(12.29 GB)
    DFS Used%: 85.63%
    DFS Remaining%: 9.13%
    Last contact: Thu May 02 13:53:50 EDT 2013

    There’s a lot of output from hadoop dfsadmin -metasave output, so I just included a couple. All of the entries are like one of the two below.

    1958 files and directories, 2142 blocks = 4100 total
    Live Datanodes: 3
    Dead Datanodes: 0
    Metasave: Blocks waiting for replication: 104

    /user/hive/.staging/job_201304112012_0013/job.jar: blk_-9053534125886728106_3656 (replicas: l: 3 d: 0 c: 0 e: 0) 10.0.0.1:50010 : 10.0.0.2:50010 : 10.0.0.3:50010 :

    /mapred/history/done/version-1/namenode_1365716316640_/2013/04/11/000000/job_201304111738_0001_1365716339723_ambari%5Fqa_word+count: blk_365653737161114802_3075 (replicas: l: 2 d: 0 c: 1 e: 0) 10.0.0.1:50010 : 10.0.0.2:50010(corrupt) : 10.0.0.3:50010 :

    Metasave: Blocks being replicated: 0
    Metasave: Blocks 0 waiting deletion from 0 datanodes
    Metasave: Number of datanodes:3

    Collapse
    #24076

    Larry Liu
    Moderator

    Hi, John

    Can you please provide the output of the followings:

    hadoop fsck /
    hadoop dfsadmin -report
    hadoop dfsadmin -metasave output
    and the namenode web GUI

    Thanks
    Larry

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)