Using the command line to manage files on HDFS

In this tutorial we will walk through some of the basic HDFS commands you will need to manage files on HDFS. To complete this tutorial you will need a working HDP cluster. The easiest way to have a Hadoop cluster is to download the Hortonworks Sandbox.

Let’s get started.

Step 1: Let’s create a directory in HDFS, upload a file and list.

Let’s look at the syntax first:

hadoop fs -mkdir:
  • It will take path uri’s as argument and creates directory or directories.
            hadoop fs -mkdir <paths> 
            hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
            hadoop fs -mkdir hdfs://
hadoop fs -ls:
  • Lists the contents of a directory
  • For a file returns stats of a file
            hadoop fs -ls <args>
            hadoop fs -ls /user/hadoop/dir1 /user/hadoop/dir2
            hadoop fs -ls /user/hadoop/dir1/filename.txt
            hadoop fs -ls hdfs://<hostname>:9000/user/hadoop/dir1/

Let’s use the following commands as follows and execute. You can ssh to the sandbox using Tools like Putty. You could download putty.exe from the internet.

enter image description here

Let’s touch a file locally.

$ touch filename.txt

enter image description here

Step 2: Now, let’s check how to find out space utilization in a HDFS dir.

hadoop fs -du:
  • Displays sizes of files and directories contained in the given directory or the size of a file if its just a file.
            hadoop fs -du URI
            hadoop fs -du  /user/hadoop/ /user/hadoop/dir1/Sample.txt

enter image description here

Step 4:

Now let’s see how to upload and download files from and to Hadoop Data File System(HDFS)
Upload: ( we have already tried this earlier)

hadoop fs -put:
  • Copy single src file, or multiple src files from local file system to the Hadoop data file system
            hadoop fs -put <localsrc> ... <HDFS_dest_Path>
            hadoop fs -put /home/ec2-user/Samplefile.txt ./ambari.repo /user/hadoop/dir3/

hadoop fs -get:

  • Copies/Downloads files to the local file system
            hadoop fs -get <hdfs_src> <localdst> 
            hadoop fs -get /user/hadoop/dir3/Samplefile.txt /home/

enter image description here

Step 5: Let’s look at quickly two advanced features.

hadoop fs -getmerge
  • Takes a source directory files as input and concatenates files in src into the destination local file.
            hadoop fs -getmerge <src> <localdst> [addnl]
            hadoop fs -getmerge /user/hadoop/dir1/  ./Samplefile2.txt
            addnl: can be set to enable adding a newline on end of each file
hadoop distcp:
  • Copy file or directories recursively
  • It is a tool used for large inter/intra-cluster copying
  • It uses MapReduce to effect its distribution copy, error handling and recovery, and reporting
            hadoop distcp <srcurl> <desturl>
            hadoop distcp hdfs://<NameNode1>:8020/user/hadoop/dir1/ \ 

You could use the following steps to perform getmerge and discp.
Let’s upload two files for this exercise first:

# touch txt1 txt2
# hadoop fs -put txt1 txt2 /user/hadoop/dir2/
# hadoop fs -ls /user/hadoop/dir2/

enter image description here

Step 6:Getting help

You can use Help command to get list of commands supported by Hadoop Data File System(HDFS)

            hadoop fs -help

enter image description here

Hope this short tutorial was useful to get the basics of file management.


Goun Na
May 3, 2014 at 6:09 pm

Very basic and essential commands. Thanks!

September 23, 2014 at 11:21 am

Thanks. I am starting to learn about Hadoop after almost 20 years of Oracle/DBA experience. What is the best starting book for administrators? not developers – I know you have to have some developer background but I really mainly focus on architecture and design.
Thanks again this was good start for me.

    Jules S. Damji
    September 24, 2014 at 2:38 pm

    There are number of books for DevOps and Administrators. A good book to begin with is Hadoop Operations. Since you come from a Relation DB background and most of data repositories in Hadoop are NoSQL, you might want read up on NoSQL concepts and semantics, for the access and storage patterns are different.

    Wiki pages for NoSQL databases like MongoDB, Casandra, HBase, etc are a good start. Another way to get familiar with Hadoop, is download Hortonworks Sandbox. We have some tutorials targeted toward Administrators.


September 24, 2014 at 5:52 am

How to mount HDFS in local system? I am using HDP 2.1 sandbox on virtualbox ready made image downloaded from this site.
I tried multiple ways but getting errors.

February 4, 2015 at 2:37 am

It is really good

February 10, 2015 at 7:07 am

A small detail, but it seems Step 3 has gone missing from this page. Otherwise very good information here.

February 15, 2015 at 10:01 pm

what is the comment used to load a folder of files into hdfs from local

August 3, 2015 at 12:03 pm

I have just begun to learn hadoop and I am sorry for asking such a silly question. I have got 1 namenode,1 secondary node and 3 data nodes. Where do we run these HDFS commands from .. Do we run it from the name node ? or do we do it from one of the data node ?

    December 16, 2015 at 12:33 am

    you have to open the terminal
    excute sudo jps to see if everything is up and running .
    open the terminal
    excute the hadoop fs shell command
    if you have hadoop and java in your path everything should work fine.

August 5, 2015 at 10:48 pm

We have a directory and list of files are available in it. I want to view the latest file which is going to be added to the Hadoop. Is there any grep command to get the file name which is added to hadoop?

    December 16, 2015 at 12:34 am

    hadoop fs -ls [path] complete path

malek BS
January 12, 2016 at 8:02 am

I’m new to hadoop and i’m currently using Hortonworks HDP Sandbox to collect tweets with flume.
I can’t access to hdfs and i try to create a directory using :
hadoop fs -mkdir hdfs://localhost:9000/user/flume/tweets_data
But i always get the following error : failed on connection exception : connection refused
Sorry to ask the question here but i did not found any good answer

January 16, 2016 at 3:02 am


1.Make sure all the Namenode, datanode,sec.namenode , Jobtracker,Tasktracker are up and running.
2.Make sure the ports are opened.
3.make sure the java installation was proper.

January 27, 2016 at 10:20 pm

hadoop fs -ls hdfs://
ls: Call From to failed on connection exception: Connection refused; For more details see:

how fix this issue ?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>