Using the command line to manage files on HDFS
In this tutorial we will walk through some of the basic HDFS commands you will need to manage files on HDFS. To complete this tutorial you will need a working HDP cluster. The easiest way to have a Hadoop cluster is to download the Hortonworks Sandbox. Let's get started.Let's use the following commands as follows and execute. You can ssh to the sandbox using Tools like Putty. You could download putty.exe from the internet. Let's touch a file locally.Hope this short tutorial was useful to get the basics of file management.
- It will take path uri's as argument and creates directory or directories.
Usage: hadoop fs -mkdir <paths> Example: hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2 hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir
- Lists the contents of a directory
- For a file returns stats of a file
Usage: hadoop fs -ls <args> Example: hadoop fs -ls /user/hadoop/dir1 /user/hadoop/dir2 hadoop fs -ls /user/hadoop/dir1/filename.txt hadoop fs -ls hdfs://<hostname>:9000/user/hadoop/dir1/
$ touch filename.txt
- Displays sizes of files and directories contained in the given directory or the size of a file if its just a file.
Usage: hadoop fs -du URI Example: hadoop fs -du /user/hadoop/ /user/hadoop/dir1/Sample.txt
- Copy single src file, or multiple src files from local file system to the Hadoop data file system
Download: hadoop fs -get:
Usage: hadoop fs -put <localsrc> ... <HDFS_dest_Path> Example: hadoop fs -put /home/ec2-user/Samplefile.txt ./ambari.repo /user/hadoop/dir3/
- Copies/Downloads files to the local file system
Usage: hadoop fs -get <hdfs_src> <localdst> Example: hadoop fs -get /user/hadoop/dir3/Samplefile.txt /home/
- Takes a source directory files as input and concatenates files in src into the destination local file.
Usage: hadoop fs -getmerge <src> <localdst> [addnl] Example: hadoop fs -getmerge /user/hadoop/dir1/ ./Samplefile2.txt Option: addnl: can be set to enable adding a newline on end of each file
- Copy file or directories recursively
- It is a tool used for large inter/intra-cluster copying
- It uses MapReduce to effect its distribution copy, error handling and recovery, and reporting
You could use the following steps to perform getmerge and discp. Let's upload two files for this exercise first:
Usage: hadoop distcp <srcurl> <desturl> Example: hadoop distcp hdfs://<NameNode1>:8020/user/hadoop/dir1/ \ hdfs://<NameNode2>:8020/user/hadoop/dir2/
# touch txt1 txt2 # hadoop fs -put txt1 txt2 /user/hadoop/dir2/ # hadoop fs -ls /user/hadoop/dir2/
Example: hadoop fs -help