I guess most of your files are not big (< 64MB)? Currently in HDFS, different files will not share the same block, thus each small file will still occupy a block. This makes your block size smaller than the full block size, i.e., 64MB.
I have a little bit confusion with HDFS default block size.. I have set block size to 64 MB.. I am importing data from Microsoft SQL Server to HDFS via SQOOP (Database with approximately 500 tables). HDFS showing total blocks as 2150 and average block size as (2165057 B) i.e. 2 MB approximately.. But I have default block size set to 64 MB.. Then why HDFS has taken block size as 2 MB ?????
Total size: 4654873379 B
Total dirs: 2522
Total files: 3350 (Files currently being written: 1)
Total blocks (validated): 2150 (avg. block size 2165057 B)
Minimally replicated blocks: 2150 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 17 (0.7906977 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0013955
Corrupt blocks: 0
Missing replicas: 116 (1.7976135 %)
Number of data-nodes: 4
Number of racks: 2
Thanks in advance
The topic ‘Confusion with HDFS default block size’ is closed to new replies.
A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.
Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world