Home Forums HDFS HDFS and under-replicated blocks

This topic contains 4 replies, has 3 voices, and was last updated by  Francois BORIE 1 year, 6 months ago.

  • Creator
    Topic
  • #19163

    Francois BORIE
    Participant

    Hello,

    I have a running cluster with 3 live datanodes, with a default HDFS replication factor of 3. I only get 111 total blocks on my datanodes, but I still have 45 blocks “under-replicated” (even if I let the cluster running for some days)

    I don’t understand why, because the namenode should automatically handle this replication.

    But are these block really under-replicated ?

    I’ve seen some threads on the web that indicate that it can be a “display” bug with all Hadoop 0.20 versions (for example, this one : http://stackoverflow.com/questions/7997587/under-replicated-blocks-count-is-inaccurate-buy-why)

    Do you agree with that ? Or shoud I always have 0 under replicated blocks.

    Many thanks for your help,

    Regards,

    François

Viewing 4 replies - 1 through 4 (of 4 total)

The topic ‘HDFS and under-replicated blocks’ is closed to new replies.

  • Author
    Replies
  • #19302

    Francois BORIE
    Participant

    Hi Larry,

    Thanks for that confirmation. Actually you’re correct and my 3 datanodes are in the same rack (cf the output of the hadoop fsck command I’ve sent to Abdelrhaman).

    I think I will wait Ambari to be rack-awareness (I’ve seen it’s in your roadmap – AMBARI-645) to start playing with those parameters.

    Thanks,

    Regards,

    François

    Collapse
    #19295

    Larry Liu
    Moderator

    Hi, Francois

    What is the topology of your cluster? If all of 3 datanodes are in same rack, under replicated issue could happen. I recommend to use topology script to make 3 datanodes logically in 2 racks.

    Thanks
    Larry

    Collapse
    #19257

    Francois BORIE
    Participant

    Hi Abdelrhaman,

    Thanks for your answer.

    You will find below the output of the commands you ask :

    -bash-4.1$ hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 45 (42.056076 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 315 (98.130844 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu Mar 28 11:09:08 CET 2013 in 1696 milliseconds

    The filesystem under path ‘/’ is HEALTHY

    -bash-4.1$ hadoop version
    Hadoop 1.1.2.21
    Subversion -r
    Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
    From source with checksum ce0aa0de785f572347f1afee69c73861

    Many thanks,

    Regards,

    François

    Collapse
    #19164

    abdelrahman
    Moderator

    Hi François,

    How is your day so far? It is possible that this issue can be a bug, But let us find out more about the issue. From command line please run the following as hdfs user on the namenode.
    # hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    #hadoop version
    Please post the output of the commands in the forum.

    Thanks
    -Abdelrhaman

    Collapse
Viewing 4 replies - 1 through 4 (of 4 total)