Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
HDP > Develop with Hadoop > Hello World

Hadoop Tutorial – Getting Started with HDP

Loading Sensor Data into HDFS

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Loading Sensor Data into HDFS

Introduction

In this section, you will download the sensor data and load that into HDFS using Ambari User Views. You will get introduced to the Ambari Files User View to manage files. You can perform tasks like create directories, navigate file systems and upload files to HDFS.  In addition, you’ll perform a few other file-related tasks as well.  Once you get the basics, you will create two directories and then load two files into HDFS using the Ambari Files User View.

Prerequisites

The tutorial is a part of series of hands on tutorial to get you started on HDP using Hortonworks sandbox. Please ensure you complete the prerequisites before proceeding with this tutorial.

Outline

HDFS backdrop

A single physical machine gets saturated with its storage capacity as the data grows. This growth drives the need to partition your data across separate machines. This type of File system that manages storage of data across a network of machines is called Distributed File Systems. HDFS is a core component of Apache Hadoop and is designed to store large files with streaming data access patterns, running on clusters of commodity hardware. With Hortonworks Data Platform (HDP), HDFS is now expanded to support heterogeneous storage media within the HDFS cluster.

Step 1 – Download and Extract Sensor Data Files

  1. Download the sample sensor data contained in a compressed (.zip) folder here:  Geolocation.zip
  2. Save the Geolocation.zip file to your computer, then extract the files. You should see a Geolocation folder that contains the following files:
    • geolocation.csv – This is the collected geolocation data from the trucks. It contains records showing truck location, date, time, type of event, speed, etc.
    • trucks.csv – This is data was exported from a relational database and it shows information on truck models, driverid, truckid, and aggregated mileage info.

Step 2 – Load the Sensor Data into HDFS

1. Logon to Ambari using: maria_dev/maria_dev

2. Go to Ambari Dashboard and open Files View.

Screen Shot 2015-07-21 at 10.17.21 AM

3. Start from the top root of the HDFS file system, you will see all the files the logged in user (maria_dev in this case) has access to see:

Lab2_2

4. Navigate to /user/maria_dev directory by clicking on the directory links.

5. Let’s create a data directory to upload the data that we are going to use for this use case. Click the Lab2_3 button to create the data directory inside the maria_dev directory. Now navigate into the data directory.

add_new_folder_data_lab1

Upload Geolocation and Trucks CSV Files to data Folder

1. If you’re not already in your newly created directory path /user/maria_dev/data, go to the data folder. Then click on the upload_icon_lab1 button to upload the corresponding geolocation.csv and trucks.csv files into it.

2. An Upload file window will appear, click on the cloud symbol.

upload_file_lab1

3. Another window will appear, navigate to the destination the two csv files were downloaded. Click on one at a time, press open to complete the upload. Repeat the process until both files are uploaded.

upload_file_window_lab1

Both files are uploaded to HDFS as shown in the Files View UI:

uploaded_files_lab1

You can also perform the following operations on a file or folder by clicking on the entity’s row: Open, Rename, Permissions, Delete, Copy, Move, Download and Concatenate.

Set Write Permissions to Write to data Folder

  1. click on the data folder’s row, which is contained within the directory path /user/maria_dev.
  2. Click Permissions.
  3. Make sure that the background of all the write boxes are checked (blue).

Refer to image for a visual explanation.

edit_permissions_lab1

Summary

Congratulations! Let’s summarize the skills and knowledge we acquired from this tutorial. We learned Hadoop Distributed File System (HDFS) was built to manage storing data across multiple machines. Now we can upload data into the HDFS using Ambari’s HDFS Files view.

Further Reading

User Reviews

User Rating
2 5 out of 5 stars
5 Star 100%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Tutorial Name
Hadoop Tutorial – Getting Started with HDP

To ask a question, or find an answer, please visit the Hortonworks Community Connection.

2 Reviews
Write Review

Register

Please register to write a review

Share Your Experience

Example: Best Tutorial Ever

You must write at least 50 characters for this field.

Success

Thank you for sharing your review!

Outstanding
by Christian Lopez on May 8, 2018 at 8:29 pm

This review is written from the perspective of a new HDP user interested in understanding this environment and the tools included in the Sandbox. First you will be introduced to the technologies involved in the tutorial namely Hadoop, Ambari, Hive, Pig Latin, SPARK, HDFS, and most importantly HDP. Next, you will use IoT data to calculate the risk factor for truck drivers by using the truck's information and their geo-location, you will accomplish this goal by uploading the needed data to your VM and storing the data as Hive tables. Additionally, you will learn to use… Show More

This review is written from the perspective of a new HDP user interested in understanding this environment and the tools included in the Sandbox.

First you will be introduced to the technologies involved in the tutorial namely Hadoop, Ambari, Hive, Pig Latin, SPARK, HDFS, and most importantly HDP. Next, you will use IoT data to calculate the risk factor for truck drivers by using the truck’s information and their geo-location, you will accomplish this goal by uploading the needed data to your VM and storing the data as Hive tables. Additionally, you will learn to use PIG Latin and SPARK to extrapolate the data needed to find the risk factor for all drivers in the set and storing the information you found back into the database. Accomplishing the same task using two different tools (SPARK, and PIG) highlights the robustness and flexibility of HDP as all the operations happen flawlessly.

I highly recommend this tutorial as it is highly informative, shows a realistic use-case, and as a new user of HDP I learned about all the cool technologies enabled to work through the Hortonworks platform, most importantly I was left with a great sense of accomplishment and that’s reason alone to try the tutorial.

Show Less
Cancel

Review updated successfully.

Excellent Tutorial!
by Ana Castro on May 8, 2018 at 4:05 pm

The tutorial was very informative and had an excellent flow. It had just the right amount of detail per concept. Great introduction to Hadoop and other Apache projects.

The tutorial was very informative and had an excellent flow. It had just the right amount of detail per concept. Great introduction to Hadoop and other Apache projects.

Show Less
Cancel

Review updated successfully.