The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDFS Forum

Ambari falsely reporting corrupt blocks

  • #24072


    I recently added a datanode to a three (now four) node cluster. Shortly after installation there was a hardware problem with the new node, which may have corrupted some blocks. Since then, Ambari reports 36 corrupt blocks. This was verified by viewing http://[namenode]:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemMetrics.

    Thinking that there was corruption, I ran ‘hadoop fsck /’. To my surprise it reported that the filesystem is healthy and that there are zero corrupt nodes.

    Which metric is the one of record? If it’s fsck as I suspect, how can I reset what Ambari sees?

    Thank you,

  • Author
  • #24076
    Larry Liu

    Hi, John

    Can you please provide the output of the followings:

    hadoop fsck /
    hadoop dfsadmin -report
    hadoop dfsadmin -metasave output
    and the namenode web GUI



    Hi Larry,

    After looking through these, I see that it looks like only single replicas on one node are corrupt. How would I fix this? It hasn’t been repaired automatically.

    Note that the new node has more storage than the existing nodes. Here are the outputs:

    hadoop fsck /

    Total size: 122737049452 B
    Total dirs: 422
    Total files: 1533 (Files currently being written: 3)
    Total blocks (validated): 2127 (avg. block size 57704301 B)
    Minimally replicated blocks: 2127 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 104 (4.889516 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 2
    Average block replication: 2.639398
    Corrupt blocks: 0
    Missing replicas: 512 (9.120057 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu May 02 13:52:35 EDT 2013 in 318 milliseconds

    hadoop dfsadmin -report

    Configured Capacity: 543633887229 (506.3 GB)
    Present Capacity: 515326439424 (479.94 GB)
    DFS Remaining: 189485461504 (176.47 GB)
    DFS Used: 325840977920 (303.46 GB)
    DFS Used%: 63.23%
    Under replicated blocks: 104
    Blocks with corrupt replicas: 36
    Missing blocks: 0

    Datanodes available: 3 (3 total, 0 dead)

    Decommission Status : Normal
    Configured Capacity: 144473710591 (134.55 GB)
    DFS Used: 123704954880 (115.21 GB)
    Non DFS Used: 7576473599 (7.06 GB)
    DFS Remaining: 13192282112(12.29 GB)
    DFS Used%: 85.62%
    DFS Remaining%: 9.13%
    Last contact: Thu May 02 13:53:50 EDT 2013

    Decommission Status : Normal
    Configured Capacity: 254686466047 (237.2 GB)
    DFS Used: 78428827648 (73.04 GB)
    Non DFS Used: 13155704831 (12.25 GB)
    DFS Remaining: 163101933568(151.9 GB)
    DFS Used%: 30.79%
    DFS Remaining%: 64.04%
    Last contact: Thu May 02 13:53:48 EDT 2013

    Decommission Status : Normal
    Configured Capacity: 144473710591 (134.55 GB)
    DFS Used: 123707195392 (115.21 GB)
    Non DFS Used: 7575269375 (7.06 GB)
    DFS Remaining: 13191245824(12.29 GB)
    DFS Used%: 85.63%
    DFS Remaining%: 9.13%
    Last contact: Thu May 02 13:53:50 EDT 2013

    There’s a lot of output from hadoop dfsadmin -metasave output, so I just included a couple. All of the entries are like one of the two below.

    1958 files and directories, 2142 blocks = 4100 total
    Live Datanodes: 3
    Dead Datanodes: 0
    Metasave: Blocks waiting for replication: 104

    /user/hive/.staging/job_201304112012_0013/job.jar: blk_-9053534125886728106_3656 (replicas: l: 3 d: 0 c: 0 e: 0) : : :

    /mapred/history/done/version-1/namenode_1365716316640_/2013/04/11/000000/job_201304111738_0001_1365716339723_ambari%5Fqa_word+count: blk_365653737161114802_3075 (replicas: l: 2 d: 0 c: 1 e: 0) : : :

    Metasave: Blocks being replicated: 0
    Metasave: Blocks 0 waiting deletion from 0 datanodes
    Metasave: Number of datanodes:3


    Sorry, I ran out of space in the last post.

    Here’s the output of the web UI:

    NameNode ‘namenode:8020’

    Started: Thu May 02 02:38:52 EDT 2013
    Version:, r
    Compiled: Thu Jan 10 03:38:39 PST 2013 by jenkins
    Upgrades: There are no upgrades in progress.

    Browse the filesystem
    Namenode Logs
    Cluster Summary

    1952 files and directories, 2125 blocks = 4077 total. Heap Size is 704 MB / 704 MB (100%)
    Configured Capacity : 506.3 GB
    DFS Used : 303.46 GB
    Non DFS Used : 26.36 GB
    DFS Remaining : 176.47 GB
    DFS Used% : 59.94 %
    DFS Remaining% : 34.86 %
    Live Nodes : 3
    Dead Nodes : 0
    Decommissioning Nodes : 0
    Number of Under-Replicated Blocks : 104

    NameNode Storage:

    Storage Directory Type State
    /opt/hadoop/hdfs/namenode IMAGE_AND_EDITS Active

    Thanks for taking a look at this.

    Larry Liu

    Hi, John

    I don’t see corrupt blocks. But there are some missing replicas. In order to fix this missing replicas, you can try to increase the replication factor to 3 (Default replication factor: 2) and after there is no under replicated blocks, and reduce the replication factor back to 2.

    Also I noticed the 3 datanodes are not balanced. Please run hadoop balancer which might help fix the missing replica issue as well.

    Hope this helps.


    Seth Lyubich

    Hi John,

    I noticed that you are running very low heap size on your namenode. Please consider increasing it and check log for any out of memory issues. Also, you can try hadoop fsck -blocks command to check for any corrupted blocks. If you find any issues you can find corrupted files with hadoop fsck -blocks -files command and delete them, or use hadoop fsck -delete command.

    Hope this helps,


The topic ‘Ambari falsely reporting corrupt blocks’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.