HDFS and under-replicated blocks

to create new topics or reply. | New User Registration

This topic contains 4 replies, has 3 voices, and was last updated by  Francois BORIE 2 years, 4 months ago.

  • Creator
  • #19163

    Francois BORIE


    I have a running cluster with 3 live datanodes, with a default HDFS replication factor of 3. I only get 111 total blocks on my datanodes, but I still have 45 blocks “under-replicated” (even if I let the cluster running for some days)

    I don’t understand why, because the namenode should automatically handle this replication.

    But are these block really under-replicated ?

    I’ve seen some threads on the web that indicate that it can be a “display” bug with all Hadoop 0.20 versions (for example, this one : http://stackoverflow.com/questions/7997587/under-replicated-blocks-count-is-inaccurate-buy-why)

    Do you agree with that ? Or shoud I always have 0 under replicated blocks.

    Many thanks for your help,



Viewing 4 replies - 1 through 4 (of 4 total)

The topic ‘HDFS and under-replicated blocks’ is closed to new replies.

  • Author
  • #19302

    Francois BORIE

    Hi Larry,

    Thanks for that confirmation. Actually you’re correct and my 3 datanodes are in the same rack (cf the output of the hadoop fsck command I’ve sent to Abdelrhaman).

    I think I will wait Ambari to be rack-awareness (I’ve seen it’s in your roadmap – AMBARI-645) to start playing with those parameters.





    Larry Liu

    Hi, Francois

    What is the topology of your cluster? If all of 3 datanodes are in same rack, under replicated issue could happen. I recommend to use topology script to make 3 datanodes logically in 2 racks.



    Francois BORIE

    Hi Abdelrhaman,

    Thanks for your answer.

    You will find below the output of the commands you ask :

    -bash-4.1$ hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 45 (42.056076 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 315 (98.130844 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu Mar 28 11:09:08 CET 2013 in 1696 milliseconds

    The filesystem under path ‘/’ is HEALTHY

    -bash-4.1$ hadoop version
    Subversion -r
    Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
    From source with checksum ce0aa0de785f572347f1afee69c73861

    Many thanks,





    Hi François,

    How is your day so far? It is possible that this issue can be a bug, But let us find out more about the issue. From command line please run the following as hdfs user on the namenode.
    # hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    #hadoop version
    Please post the output of the commands in the forum.


Viewing 4 replies - 1 through 4 (of 4 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.