The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDFS Forum

HDFS and under-replicated blocks

  • #19163
    Francois BORIE
    Participant

    Hello,

    I have a running cluster with 3 live datanodes, with a default HDFS replication factor of 3. I only get 111 total blocks on my datanodes, but I still have 45 blocks “under-replicated” (even if I let the cluster running for some days)

    I don’t understand why, because the namenode should automatically handle this replication.

    But are these block really under-replicated ?

    I’ve seen some threads on the web that indicate that it can be a “display” bug with all Hadoop 0.20 versions (for example, this one : http://stackoverflow.com/questions/7997587/under-replicated-blocks-count-is-inaccurate-buy-why)

    Do you agree with that ? Or shoud I always have 0 under replicated blocks.

    Many thanks for your help,

    Regards,

    François

  • Author
    Replies
  • #19164
    abdelrahman
    Moderator

    Hi François,

    How is your day so far? It is possible that this issue can be a bug, But let us find out more about the issue. From command line please run the following as hdfs user on the namenode.
    # hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    #hadoop version
    Please post the output of the commands in the forum.

    Thanks
    -Abdelrhaman

    #19257
    Francois BORIE
    Participant

    Hi Abdelrhaman,

    Thanks for your answer.

    You will find below the output of the commands you ask :

    -bash-4.1$ hadoop fsck / -locations -blocks -files | grep -i -C6 miss
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 45 (42.056076 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 315 (98.130844 %)
    Number of data-nodes: 3
    Number of racks: 1
    FSCK ended at Thu Mar 28 11:09:08 CET 2013 in 1696 milliseconds

    The filesystem under path ‘/’ is HEALTHY

    -bash-4.1$ hadoop version
    Hadoop 1.1.2.21
    Subversion -r
    Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
    From source with checksum ce0aa0de785f572347f1afee69c73861

    Many thanks,

    Regards,

    François

    #19295
    Larry Liu
    Moderator

    Hi, Francois

    What is the topology of your cluster? If all of 3 datanodes are in same rack, under replicated issue could happen. I recommend to use topology script to make 3 datanodes logically in 2 racks.

    Thanks
    Larry

    #19302
    Francois BORIE
    Participant

    Hi Larry,

    Thanks for that confirmation. Actually you’re correct and my 3 datanodes are in the same rack (cf the output of the hadoop fsck command I’ve sent to Abdelrhaman).

    I think I will wait Ambari to be rack-awareness (I’ve seen it’s in your roadmap – AMBARI-645) to start playing with those parameters.

    Thanks,

    Regards,

    François

The topic ‘HDFS and under-replicated blocks’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.