How To Enable NFS Access to HDFS in Hortonworks Sandbox

In this blog we’ll set up NFS for HDFS access with the Hortonworks Sandbox 1.3. This allows the reading and writing of files to Hadoop using familiar methods to desktop users. Sandbox is a great way to understand this particular type of access.

If you don’t have it already, then download the sandbox here. Got the download? Then let’s get started.

Start the Sandbox. Get to this screen.

We will now enable Ambari so that we can edit the configuration to enable NFS. Log into the Sandbox as a root SSH session. The ‘root’ account password is ‘hadoop’.

Install the NFS Server bits for the Linux OS.

yum install nfs* -y

You may have to enable an externally facing network adapter to allow the yum command to resolve the correct repository. if this is not possible you will need the package called nfs-utils for Centos 6.

Start the Ambari Server

Navigating to the IP address provided when the sandbox started there is documentation provided on starting Ambari. The step are summarized below. Please be sure to remember to reboot the virtual machine after you have run the start_ambari script.



You must reboot the sandbox after you run the install Amabari script.

Open Ambari UI in the browser

Sign in with username: admin, password: admin.

We will now update the HDFS configs to enable NFS. To do that, we’ll need to stop the HDFS and MapReduce services, update the configs, and restart HDFS and MapReduce. MapReduce must be stopped first followed by HDFS.

Go to Services tab on top, then select MapReduce, and click Stop.


Go to Services tab on top, then select HDFS on the left, and choose Configs sub tab.

Click the Stop button to stop the HDFS services.

A successful service stoppage will show this:

In Configs tab, open the Advanced section, and change the value for dfs.access.time.precision to 3600000. This would be edited in the hdfs-default.xml via the command line.

In the same section, change the value for dfs.datanode.max.xcievers to 1024.

In Custom hdfs-site.xml section, add the following property:

This should then look like:

Then click the Save button.

Start the HDFS services, and then the MapReduce services

You need to stop the native Linux services nfs and portmap and then start the Hadoop enabled version:

service nfs stop
service rpcbind stop

hadoop portmap
hadoop nfs3

To get this started each time you restart your sandbox you can add a few lines to your rc.local startup script:

hadoop-daemon.sh start portmap
hadoop-daemon.sh start nfs3

This will place logs for each service in /var/log/hadoop.

Verify NFS server is up and running on the sandbox with the rpcinfo command. You may also run the showmount command both on the sandbox and on the client machine. You should see output similar to the output below stating “/” is available to everyone.

Create a user on your client machine that matches a user in the Sandbox HDP VM.

For example, hdfs is a user on the Sandbox VM. The UID for hdfs is 497.

On my client machine, which happens to be a Mac OS X machine, I’ll create a user hdfs with the same UID with the following commands:

sudo -i
mkdir /Users/hdfs
dscl . create /Users/hdfs
dscl . create /Users/hdfs RealName "hdfs"
dscl . create /Users/hdfs hint "Password Hint"
dscl . passwd /Users/hdfs hdfs
dscl . create /Users/hdfs UniqueID 497
dscl . create /Users/hdfs PrimaryGroupID 201
dscl . create /Users/hdfs UserShell /bin/bash
dscl . create /Users/hdfs NFSHomeDirectory /Users/hdfs
chown -R hdfs:guest /Users/hdfs

If on another operating system, create a user hdfs with the UID 497 to match the user on the sandbox VM. This is easily accomplished in Linux using the -u option to the adduser command. In Windows you likely want to use a NFS Client such as this. The answer for Server and premium versions of Windows includes adding the Subsystem for Unix Applications.

Mount HDFS as a file system on local client machine

mount -t nfs -o vers=3,proto=tcp,nolock HOSTIP:/  /PATH/TO/MOUNTPOINT

Now browse HDFS as if it was part of the local filesystem.

Load data off HDFS onto the local file system:

Delete data in HDFS:

Load data into HDFS. Take a file from the local disk, mahout.zip, and load it into the hdfs user directory on HDFS file system. On this local machine, HDFS is mounted at /Users/hdfs/mnt/

Additionally you can verify your files are in HDFS via the file browser in the Hue interface provided with the sandbox or returning to the command line you can change to users hdfs (su – hdfs) and use standard hadoop command line commands to query for your files.

Conclusion

Using this interface allows users of a Hadoop cluster to rapidly push data to HDFS in a way in which they are familiar from their desktops. Additionally this opens up the possibilities for scripting the pushing of data from some networked machine into Hadoop including upstream preprocessing of data from other systems.

Categorized by :
Ambari Hadoop in the Enterprise Sandbox

Comments

Preetham
|
May 6, 2014 at 11:31 pm
|

How to Enable UNIX_AUTH? and what are the configuration/changes needs to be done? please can you explain

Brandon Li
|
August 25, 2013 at 8:18 pm
|

On my MacOS, the VirtualBox by default used the NAT network mode on guest, and thus the NFS export was not visible to the host or a different machine.

After I changed the gust network mode to Bridged Adapter, I could access the NFS service from outside.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Using Ambari for Management
Thursday, September 4, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.