Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
HDP > Develop with Hadoop > Hello World

Hadoop Tutorial – Getting Started with HDP

Loading Sensor Data into HDFS

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Loading Sensor Data into HDFS

Introduction

In this section, you will download the sensor data and load that into HDFS using Ambari User Views. You will get introduced to the Ambari Files User View to manage files. You can perform tasks like create directories, navigate file systems and upload files to HDFS.  In addition, you’ll perform a few other file-related tasks as well.  Once you get the basics, you will create two directories and then load two files into HDFS using the Ambari Files User View.

Prerequisites

The tutorial is a part of series of hands on tutorial to get you started on HDP using Hortonworks sandbox. Please ensure you complete the prerequisites before proceeding with this tutorial.

Outline

HDFS backdrop

A single physical machine gets saturated with its storage capacity as the data grows. This growth drives the need to partition your data across separate machines. This type of File system that manages storage of data across a network of machines is called Distributed File Systems. HDFS is a core component of Apache Hadoop and is designed to store large files with streaming data access patterns, running on clusters of commodity hardware. With Hortonworks Data Platform (HDP), HDFS is now expanded to support heterogeneous storage media within the HDFS cluster.

Step 1 – Download and Extract Sensor Data Files

  1. Download the sample sensor data contained in a compressed (.zip) folder here:  Geolocation.zip
  2. Save the Geolocation.zip file to your computer, then extract the files. You should see a Geolocation folder that contains the following files:
    • geolocation.csv – This is the collected geolocation data from the trucks. It contains records showing truck location, date, time, type of event, speed, etc.
    • trucks.csv – This is data was exported from a relational database and it shows information on truck models, driverid, truckid, and aggregated mileage info.

Step 2 – Load the Sensor Data into HDFS

1. Logon to Ambari using: maria_dev/maria_dev

2. Go to Ambari Dashboard and open Files View.

Screen Shot 2015-07-21 at 10.17.21 AM

3. Start from the top root of the HDFS file system, you will see all the files the logged in user (maria_dev in this case) has access to see:

Lab2_2

4. Navigate to /user/maria_dev directory by clicking on the directory links.

5. Let’s create a data directory to upload the data that we are going to use for this use case. Click the Lab2_3 button to create the data directory inside the maria_dev directory. Now navigate into the data directory.

add_new_folder_data_lab1

Upload Geolocation and Trucks CSV Files to data Folder

1. If you’re not already in your newly created directory path /user/maria_dev/data, go to the data folder. Then click on the upload_icon_lab1 button to upload the corresponding geolocation.csv and trucks.csv files into it.

2. An Upload file window will appear, click on the cloud symbol.

upload_file_lab1

3. Another window will appear, navigate to the destination the two csv files were downloaded. Click on one at a time, press open to complete the upload. Repeat the process until both files are uploaded.

upload_file_window_lab1

Both files are uploaded to HDFS as shown in the Files View UI:

uploaded_files_lab1

You can also perform the following operations on a file or folder by clicking on the entity’s row: Open, Rename, Permissions, Delete, Copy, Move, Download and concatenate.

Set Write Permissions to Write to data Folder

  1. click on the data folder’s row, which is contained within the directory path /user/maria_dev.
  2. Click Permissions.
  3. Make sure that the background of all the write boxes are checked (blue).

Refer to image for a visual explanation.

edit_permissions_lab1

Summary

Congratulations! Let’s summarize the skills and knowledge we acquired from this tutorial. We learned Hadoop Distributed File System (HDFS) was built to manage storing data across multiple machines. Now we can upload data into the HDFS using Ambari’s HDFS Files view.

Further Reading