HDFS Forum

HDFS and under-replicated blocks

  • #19163
    Francois BORIE
    Participant

    Hello,

    I have a running cluster with 3 live datanodes, with a default HDFS replication factor of 3. I only get 111 total blocks on my datanodes, but I still have 45 blocks “under-replicated” (even if I let the cluster running for some days)

    I don’t understand why, because the namenode should automatically handle this replication.

    But are these block really under-replicated ?

    I’ve seen some threads on the web that indicate that it can be a “display” bug with all Hadoop 0.20 versions (for example, this one : http://stackoverflow.com/questions/7997587/under-replicated-blocks-count-is-inaccurate-buy-why)

    Do you agree with that ? Or shoud I always have 0 under replicated blocks.

    Many thanks for your help,

    Regards,

    François

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #19164
    abdelrahman
    Moderator

    Hi François,

    How is your day so far? It is possible that this issue can be a bug, But let us find out more about the issue. From command line please run the following as hdfs user on the namenode.
    # hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    #hadoop version
    Please post the output of the commands in the forum.

    Thanks
    -Abdelrhaman

    #19257
    Francois BORIE
    Participant

    Hi Abdelrhaman,

    Thanks for your answer.

    You will find below the output of the commands you ask :

    -bash-4.1$ hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 45 (42.056076 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 315 (98.130844 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu Mar 28 11:09:08 CET 2013 in 1696 milliseconds

    The filesystem under path ‘/’ is HEALTHY

    -bash-4.1$ hadoop version
    Hadoop 1.1.2.21
    Subversion -r
    Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
    From source with checksum ce0aa0de785f572347f1afee69c73861

    Many thanks,

    Regards,

    François

    #19295
    Larry Liu
    Moderator

    Hi, Francois

    What is the topology of your cluster? If all of 3 datanodes are in same rack, under replicated issue could happen. I recommend to use topology script to make 3 datanodes logically in 2 racks.

    Thanks
    Larry

    #19302
    Francois BORIE
    Participant

    Hi Larry,

    Thanks for that confirmation. Actually you’re correct and my 3 datanodes are in the same rack (cf the output of the hadoop fsck command I’ve sent to Abdelrhaman).

    I think I will wait Ambari to be rack-awareness (I’ve seen it’s in your roadmap – AMBARI-645) to start playing with those parameters.

    Thanks,

    Regards,

    François

The topic ‘HDFS and under-replicated blocks’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.