Home Forums HDFS Confusion with HDFS default block size

This topic contains 2 replies, has 2 voices, and was last updated by  Swapnil Patil 5 months, 1 week ago.

  • Creator
    Topic
  • #48401

    Swapnil Patil
    Participant

    I have a little bit confusion with HDFS default block size.. I have set block size to 64 MB.. I am importing data from Microsoft SQL Server to HDFS via SQOOP (Database with approximately 500 tables). HDFS showing total blocks as 2150 and average block size as (2165057 B) i.e. 2 MB approximately.. But I have default block size set to 64 MB.. Then why HDFS has taken block size as 2 MB ?????

    ..Status: HEALTHY
    Total size: 4654873379 B
    Total dirs: 2522
    Total files: 3350 (Files currently being written: 1)
    Total blocks (validated): 2150 (avg. block size 2165057 B)
    Minimally replicated blocks: 2150 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 17 (0.7906977 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0013955
    Corrupt blocks: 0
    Missing replicas: 116 (1.7976135 %)
    Number of data-nodes: 4
    Number of racks: 2

    Thanks in advance :)

Viewing 2 replies - 1 through 2 (of 2 total)

The topic ‘Confusion with HDFS default block size’ is closed to new replies.

  • Author
    Replies
  • #48504

    Swapnil Patil
    Participant

    Hi Jing,
    Thanks for your reply.
    Yes my files are pretty much smaller approximately ( > 1MB in size). As I am putting more files in HDFS ,the average block size is decreasing.. previously it was 2 MB now its showing 1 MB.. I didn’t get that..

    Total size: 4798151501 B
    Total dirs: 4881
    Total files: 6488
    Total blocks (validated): 4134 (avg. block size 1160655 B)
    Minimally replicated blocks: 4134 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 0 (0.0 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 0 (0.0 %)
    Number of data-nodes: 4
    Number of racks: 2

    Collapse
    #48447

    Jing Zhao
    Participant

    Hi Swapnil,

    I guess most of your files are not big (< 64MB)? Currently in HDFS, different files will not share the same block, thus each small file will still occupy a block. This makes your block size smaller than the full block size, i.e., 64MB.

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)