HDFS Forum

Confusion with HDFS default block size

  • #48401
    Swapnil Patil
    Participant

    I have a little bit confusion with HDFS default block size.. I have set block size to 64 MB.. I am importing data from Microsoft SQL Server to HDFS via SQOOP (Database with approximately 500 tables). HDFS showing total blocks as 2150 and average block size as (2165057 B) i.e. 2 MB approximately.. But I have default block size set to 64 MB.. Then why HDFS has taken block size as 2 MB ?????

    ..Status: HEALTHY
    Total size: 4654873379 B
    Total dirs: 2522
    Total files: 3350 (Files currently being written: 1)
    Total blocks (validated): 2150 (avg. block size 2165057 B)
    Minimally replicated blocks: 2150 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 17 (0.7906977 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0013955
    Corrupt blocks: 0
    Missing replicas: 116 (1.7976135 %)
    Number of data-nodes: 4
    Number of racks: 2

    Thanks in advance :)

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #48447
    Jing Zhao
    Moderator

    Hi Swapnil,

    I guess most of your files are not big (< 64MB)? Currently in HDFS, different files will not share the same block, thus each small file will still occupy a block. This makes your block size smaller than the full block size, i.e., 64MB.

    #48504
    Swapnil Patil
    Participant

    Hi Jing,
    Thanks for your reply.
    Yes my files are pretty much smaller approximately ( > 1MB in size). As I am putting more files in HDFS ,the average block size is decreasing.. previously it was 2 MB now its showing 1 MB.. I didn’t get that..

    Total size: 4798151501 B
    Total dirs: 4881
    Total files: 6488
    Total blocks (validated): 4134 (avg. block size 1160655 B)
    Minimally replicated blocks: 4134 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 0 (0.0 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 0 (0.0 %)
    Number of data-nodes: 4
    Number of racks: 2

The topic ‘Confusion with HDFS default block size’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.