Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 13, 2013
prev slideNext slide

Simplifying data management: NFS access to HDFS

We are excited that another critical Enterprise Hadoop integration requirement – NFS Gateway access to HDFS – is making progress through the main Apache Hadoop trunk.  This effort is architected and designed by Brandon Li and Suresh Srinivas, and is being delivered by the community. You can track progress in Apache JIRA HDFS-4750.

With NFS access to HDFS, you can mount the HDFS cluster as a volume on client machines and have native command line, scripts or file explorer UI to view HDFS files and load data into HDFS.  NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets.

NFS and HDFS

Network File System (NFS) is a distributed file system protocol that allows access to files on a remote computer in a manner similar to how local file system is accessed.  With a NFS gateway for Hadoop, files can now be browsed, downloaded and written to and from HDFS as if it is local file system. These are critical enterprise requirements.

Bringing the full capability of NFS to HDFS is an important strategic initiative for us. In the first phase, we have enabled NFSv3 interface access to HDFS. This is done using NFS Gateway, a stateless daemon, that translates NFS protocol to HDFS access protocols as shown in the following diagram. Many instances of such daemon can be run to provide high throughput read/write access to HDFS from multiple clients. As a part of this work, HDFS now has a significant functionality that supports inode ID or file handles, that was done in Apache JIRA HDFS-4489.

nfs

We are excited to work with the community to enable a robust roadmap for NFS functionality, focussing on the following capabilities:

  • NFSv4 and other protocols for access to HDFS
  • Highly Available NFS Gateway
  • Secure Hadoop (Kerberos) integration

The first phase of development is complete and is undergoing rigorous testing and stabilization. This set of functionality is being run through our integrated HDP stack test suite to ensure enterprise readiness.

The NFS Gateway functionality is being made available in the community and can be tracked in JIRA HDFS-4750.

Tags:

Comments

Nikhil Mulley says:

Hi Srinivas,

This is something of a great value addition to the hadoop infrastructure stack as such. Though I have not seen JIRA pages yet, what happened to providing mount functionality to hdfs on client machines using fuse system. Is there a difference in the functionality or in the approach (and possibly any gains?) of using nfs gateway than using mount of hdfs via fuse?

Nikhil

Brandon Li says:

There are a few problems using FUSE to provide the NFS mount for HDFS.
Unlike NFSv3, FUSE is not inode based. FUSE usually uses path to generate the NFS file handle. Its path-handle mapping can make the host run out of memory. Even it can work around the memory problem, it could have correctness issue. FUSE may not be aware that a file’s path has been changed by other means(e.g., hadoop CLI). If FUSE is used on the client side, each NFS client has to install a client component which runs only on Linux so far.

Nikhil Mulley says:

Another question, just a quick thought, since NFS like behaviour is being brought to hadoop, can I use semantics of exports.local and say that particular namespaces are allowed only to a particular set of named hosts, ipaddresses or group of hosts via netgroups and put some user level acls as well? If yes, I would/can achieve the filesystem exposure via nfs to be controlled and restrictive.

Brandon Li says:

Yes. JIRA HDFS-4947 is to track the effort to support export table in the NFS gateway.

Asim Praveen says:

Can the NFS gateway be collocated on each NFS client as a special case of multiple gateways case?

Brandon Li says:

Yes. The NFS gateway machine needs everything to run an HDFS client, like Hadoop core JAR file, HADOOP_CONF directory.

The NFS gateway can be on any DataNode, NameNode, or any HDP client machine.

Evan Zhou says:

Can we setup muti NFS gateway services?
Like, we have several clients and they could connect to different services. In this way, the read/write throughput could be improved.

Brandon Li says:

Yes. You can start multiple NFS gateways on DataNode or Client node to improve throughput.
Also, each gateway can export a different directory by configuring “dfs.nfs3.export.point”(renamed to “nfs.export.point” in Hadoop2.5 and later releases). By default, the only export is “/”.

Thanks,
Brandon

Ben says:

If I have multiple clients and multiple NFS gateways, I can mount gateway 1 to client 1-3, gateway 2 to client 4-6, but it is not managable for client sides. Does a load balancer work with multiple NFS gateways?

Another question: Which one is faster, NFS gateway or HttpFS?

Brandon Li says:

Currently there is no built-in load balancer for NFS gateway. The client needs to manage the load on each mount point if multiple NFS gateways’ exports are mounted on the same client.

Regarding the performance, I think it depends on the workload. NFS gateway works well with large amount of small file manipulations.

K Francisco says:

Nice. A win for ease of management and ad-hoc file access/updates.

Ryan Gerlach says:

Hi, does the NFS gateway in HDP 2.0 support clusters using Kerberos?
Ryan

Brandon Li says:

Not now, but Kerberos support is on the road map and we are working on the implementation.

Thanks,
Brandon

Ryan says:

Thanks Brandon. I am trying to track down if Kerberos support is included in HDP 2.1, do you know if it is? I had taken code from this Jira and applied it to our HDP 2.0.
https://issues.apache.org/jira/browse/HDFS-5804
Thanks,
Ryan

Brandon Li says:

Hi Ryan, HDP2.1 will include the Kerberos support. With HDP2.1, NFS gateway can access secure HDFS cluster as discussed in HDFS-5804.

Thanks,
Brandon

Bhaskie says:

Is there any performance improvement if an application uses NFS to write data to HDFS?

Brandon Li says:

NFS gateway uses DFSClient to access HDFS. Recently there is some performance improvement and comparison with FUSE in the JIRA HDFS-6080 (https://issues.apache.org/jira/browse/HDFS-6080).

Jacinda says:

How can I access HDFS by NFS?I don’t know the total method!

Brandon Li says:

If you are using Apache Hadoop, here is the user guide for 2.3 release http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html

If you are using HDP2.0, you can find the user guide here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_user-guide/content/user-guide-hdfs-nfs.html

Jacinda says:

How can I realiaze the whole function about throughing nfs to access HDFS.Does it only configure hdfs-default.xml? then start portmap and nfs3 service? But I cann’t start portmap service,because it stop in the open ! So please help me , Thank you very much!

Eric W says:

Is it crazy for me to consider installing an NFS gateway on each DataNode and using a load balancing mechanism (TCP or DNS RR) to spread requests across the cluster?

Eric W says:

Would it be crazy to install an NFS gateway on each DataNode and use a load balancer (TCP or DNS RR) to spread NFS connections across the entire cluster?

Brandon Li says:

Actually starting NFS gateway on multiple DataNodes is one way for increased throughput. For example, using each gateway to load/download different batch of files.

Since HDFS doesn’t support multiple writers, spreading writes of the same file to multiple NFS gateway will be a problem. However, reads should be fine.

Preetham says:

Do we have to set up UID in NFS Server to implement AUTH_UNIX? What are the configuration changes required to implement AUTH_UNIX

Brandon Li says:

AUTH_UNIX is usually the default NFS security policy and you don’t have to do anything special.

If the user accounts in your cluster is managed by name services as LDAP/NIS, the UID should be the same for the same user on both the client and server.

Thanks,
Brandon

Hari says:

Can I use NFS over HDFS for running the Virtual Machines in XenServer/VMWare

Brandon Li says:

If you want to mount the NFS export on the virtual machine, yes, you can. It’s no different with mounting it on a physical Linux box.

Please let me know if I misunderstood your question.

Thanks,
Brandon

Harsh says:

I would like to integrate HDFS and NFS such that I can create analytic pipelines using use open source tools that are not Hadoop aware (they depend on the NFS system) and then do the translation into HDFS. Does the NFS Gateway access to HDFS provide this ability? Are there drivers, modules available? Thanks!

Brandon Li says:

There is no extra drivers/modules needed. After you start HDFS and NFS gateway, you can mount HDFS export as regular NFS export. In term of data ingestion, random write to an existing file is not supported yet. File append is supported.

Thanks,
Brandon

Daniel Kalinowski says:

Hello,
Can someone explain me how NFS will work with HA cluster. In failover scenario ? More specific :
I have mounted my resources like that:
mount -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point where $server is my Active NameNode ip.
Everything works fine. but suddenly some kind of error happens and my StandBy NameNode become Active NameNode ,
and my mount won`t work , im right ?
If im wrong correct me with some doc`s
Best regards , Daniel

Brandon Li says:

Hi Daniel,
NFS gateway can be started on NameNode, DataNode or even a DFS client node. When you mount the export, you can use the IP address where the gateway is running. The gateway is essentially a DFSClient. When failover happens, the DFSClient inside NFS gateway can automatically connect to the new active NameNode.

Thanks,
Brandon

Daniel Kalinowski says:

Thanks for fast respond Brandon. I might be greedy but can you support me with some example/documentation ?
Let`s hope that answer will satisfy my Boss.
I really appreciate your help.
Daniel

Brandon Li says:

Here is the most recent released NFS user guide:
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html

aikinhdo says:

So it is not possible to have proxy NFS ? If clientA mount HDFS from serverA, if serverA failed/reboot/crash, clientA has lose access to HDFS. it is correct ?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums