HOWTO: Test HDFS Setup

ISSUE

How do I run simple Hadoop Distributed File System tasks? Or

How do I test that HDFS services are working?

SOLUTION

Make sure the name node and the data nodes are started.

To start the name node:

su hdfs - -c "hadoop-daemon.sh --config /etc/hadoop/ start namenode"

To start a data node:

su hdfs - -c "hadoop-daemon.sh --config /etc/hadoop start datanode"

Put data files into HDFS. This command will take a file from disk and put into HDFS:

su hdfs
hadoop fs -put trial_file.csv /user/hdfs/trial_file.csv

Read data from HDFS. This command will read the contents of a file from HDFS and display on the console:

su hdfs
hadoop fs -cat /user/hdfs/trial_file.csv

References:

http://hadoop.apache.org/common/docs/current/file_system_shell.html

Tags: , Read More »

Best Practices: Linux File Systems for HDFS

ISSUE:

Choosing the appropriate Linux file system for HDFS deployment

SOLUTION:

The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and Operating System. Linux offers a variety of file system choices, each with caveats that have an impact on HDFS.

As a general best practice, if you are mounting disks solely for Hadoop data, disable ‘noatime’. This speeds up reads for files.

There are three Linux file system options that are popular to choose from:

  • Ext3
  • Ext4
  • XFS

Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is also the default filesystem choice for many popular Linux OS flavours. Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes for a safe choice for the underlying file system.

ext4 is the successor to ext3. ext4 has better performance with large files.…

Tags: , Read More »

HOWTO: Check the Health of an HDFS Cluster

ISSUE

How do I check the health of my HDFS cluster (name node and all data nodes)?

SOLUTION

Hadoop includes the dfsadmin command line tool for HDFS administration functionality. This tool allows the user to view the status of the HDFS cluster.

To view a comprehensive status report, execute the following command:

hadoop dfsadmin -report

This command will output basic statistics of the cluster health. This includes the status of the namenode, status of each datanode, disk capacity amounts, block health statuses.

The same information can be found on the NameNode web status page – at http://<namenode IP>:50070/dfshealth.jsp

References:
http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#DFSAdmin+Command

Tags: , Read More »

You are not currently logged in.






» Lost your Password?

Join Our Community

Stay up-to-date on the latest news, download software, watch training videos and more.

Join the Hortonworks Community

About HDP

Hortonworks Data Platform (HDP) is a 100% open source data management platform based on Apache Hadoop. It allows you to load, store, process and manage data in virtually any format and at any scale.

Learn More

Hadoop Training

Developing Solutions with Apache Hadoop Classes

Understanding Hadoop on Windows Classes

Applying Data Science using Apache Hadoop Classes

Developing Apache Hadoop Applications with Java Classes

View All Classes »