cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Using the Command Line to Manage Files on HDFS

Introduction

In this tutorial, we will walk through some of the basic Hadoop Distributed File System (HDFS) commands you will need to manage files on HDFS.

Pre-Requisites

1.) Open your vi editor with the following command:

vi popularNames.txt

Note: You can create any text file with data you want or you can follow the example. You could also download text file.

2.) In vi, press i to insert text. Copy and paste the following data into the text file:

Rank    Male            Female
1	Noah	        Emma
2	Liam	        Olivia
3	Mason	        Sophia
4	Jacob	        Isabella
5	William	        Ava
6	Ethan	        Mia
7	Michael	        Emily
8	Alexander	Abigail
9	James	        Madison
10	Daniel	        Charlotte

3.) Press esc button and :wq to save and quit the vi.

local_file_system_path_popularNames_txt

popularNames.txt in the example above is located in ~ directory.

Outline

Step 1: Create a directory in HDFS, Upload a file and List Contents

Let’s learn by writing the syntax. You will be able to copy and paste the following example commands into your terminal:

hadoop fs -mkdir:

  • Takes the path uri’s as an argument and creates a directory or multiple directories.
# Usage:
        # hadoop fs -mkdir <paths>
# Example:
        hadoop fs -mkdir /user/hadoop
        hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2 /user/hadoop/dir3

create_3_dir

hadoop fs -put:

  • Copies single src file or multiple src files from local file system to the Hadoop Distributed File System.
# Usage:
        # hadoop fs -put <local-src> ... <HDFS_dest_path>
# Example:
        hadoop fs -put popularNames.txt /user/hadoop/dir1/popularNames.txt

upload_file_local_file_system

hadoop fs -ls:

  • Lists the contents of a directory
  • For a file, returns stats of a file
# Usage:
        # hadoop  fs  -ls  <args>
# Example:
        hadoop fs -ls /user/hadoop
        hadoop fs -ls /user/hadoop/dir1
        hadoop fs -ls /user/hadoop/dir1/popularNames.txt

list_contents_directory

Step 2: Find Out Space Utilization in a HDFS Directory

hadoop fs -du:

  • Displays size of files and directories contained in the given directory or the size of a file if its just a file.
# Usage:
        # hadoop fs -du URI
# Example:
        hadoop fs -du  /user/hadoop/ /user/hadoop/dir1/popularNames.txt

display_sizes_dir_files

Step 3: Download File From HDFS to Local File System

hadoop fs -get:

  • Copies/Downloads files from HDFS to the local file system
# Usage:
        # hadoop fs -get <hdfs_src> <localdst>
# Example:
        hadoop fs -get /user/hadoop/dir1/popularNames.txt /home/

enter image description here

Step 4: Explore Two Advanced Features

hadoop fs -getmerge

  • Takes a source directory file or files as input and concatenates files in src into the local destination file.
# Usage:
        # hadoop fs -getmerge <src> <localdst> [addnl]
# Option:
        # addnl: can be set to enable adding a newline on end of each file
# Example:
        hadoop fs -getmerge /user/hadoop/dir1/  ./popularNamesV2.txt

copy_dir1_to_dir3

Copies all files from dir1 and stores them into popularNamesV2.txt

hadoop distcp:

  • Copy file or directories recursively
  • It is a tool used for large inter/intra-cluster copying
  • It uses MapReduce to effect its distribution copy, error handling and recovery, and reporting
# Usage:
        # hadoop distcp <src-url> <dest-url>
# Example:
        hadoop distcp /user/hadoop/dir1/ /user/hadoop/dir3/

copy_file_recursively_distcp

distcp: copies dir1 and all its contents to dir3

copy_dir1_to_dir3

Visual result of distcp operation’s aftermath, dir1 gets copied to dir3

Step 5: Use Help Command to access Hadoop Command Manual

Help command opens the list of commands supported by Hadoop Data File System (HDFS)

# Example:
        hadoop  fs  -help

hadoop_help_command_manual

Hope this short tutorial was useful to get the basics of file management.

Summary

Congratulations! We just learned to use commands to manage our files in HDFS. We know how to create, upload and list the the contents in our directories. We also acquired the skills to download files from HDFS to our local file system and explored a few advanced features of the command line.

Further Reading


Tutorial Q&A and Reporting Issues

If you need help or have questions with this tutorial, please first check HCC for existing Answers to questions on this tutorial using the Find Answers button. If you don’t find your answer you can post a new HCC question for this tutorial using the Ask Questions button.


Tutorial Q&A and Reporting Issues

If you need help or have questions with this tutorial, please first check
HCC for existing Answers to questions on this tutorial using the Find Answers
button. If you don’t find your answer you can post a new HCC question for
this tutorial using the Ask Questions button.

Find Answers Ask Questions

Tutorial Name: Using the Command Line to Manage Files on HDFS

HCC Tutorial Tag: tutorial-120 and hdp-2.4.0

If the tutorial has multiple labs please indicate which lab your question corresponds to. Please provide any feedback related to that lab.

All Hortonworks, partner and community tutorials are posted in the Hortonworks github and can be
contributed via the Hortonworks Tutorial
Contribution Guide
. If you are certain there is an
issue or bug with the tutorial, please
create an issue
on the repository and we will do our best to resolve it!