Simplifying data management: NFS access to HDFS
We are excited that another critical Enterprise Hadoop integration requirement – NFS Gateway access to HDFS – is making progress through the main Apache Hadoop trunk. This effort is architected and designed by Brandon Li and Suresh Srinivas, and is being delivered by the community. You can track progress in Apache JIRA HDFS-4750.
With NFS access to HDFS, you can mount the HDFS cluster as a volume on client machines and have native command line, scripts or file explorer UI to view HDFS files and load data into HDFS. NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets.
NFS and HDFS
Network File System (NFS) is a distributed file system protocol that allows access to files on a remote computer in a manner similar to how local file system is accessed. With a NFS gateway for Hadoop, files can now be browsed, downloaded and written to and from HDFS as if it is local file system. These are critical enterprise requirements.
Bringing the full capability of NFS to HDFS is an important strategic initiative for us. In the first phase, we have enabled NFSv3 interface access to HDFS. This is done using NFS Gateway, a stateless daemon, that translates NFS protocol to HDFS access protocols as shown in the following diagram. Many instances of such daemon can be run to provide high throughput read/write access to HDFS from multiple clients. As a part of this work, HDFS now has a significant functionality that supports inode ID or file handles, that was done in Apache JIRA HDFS-4489.
We are excited to work with the community to enable a robust roadmap for NFS functionality, focussing on the following capabilities:
- NFSv4 and other protocols for access to HDFS
- Highly Available NFS Gateway
- Secure Hadoop (Kerberos) integration
The first phase of development is complete and is undergoing rigorous testing and stabilization. This set of functionality is being run through our integrated HDP stack test suite to ensure enterprise readiness.
The NFS Gateway functionality is being made available in the community and can be tracked in JIRA HDFS-4750.