Choosing the appropriate Linux file system for HDFS deployment
The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and Operating System. Linux offers a variety of file system choices, each with caveats that have an impact on HDFS.
As a general best practice, if you are mounting disks solely for Hadoop data, disable ‘noatime’. This speeds up reads for files.
There are three Linux file system options that are popular to choose from:
Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is also the default filesystem choice for many popular Linux OS flavours. Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes for a safe choice for the underlying file system.
ext4 is the successor to ext3. ext4 has better performance with large files.…Tags: HDFS, Linux Read More »