If you have any errors in completing this tutorial. Please ask questions or notify us on Hortonworks Community Connection!
In this tutorial, you will do the following:
This example code is derived from Concurrent Inc.’s training class by Alexis Roos (@alexisroos). It demonstrates the simplicity of using Cascading Java Framework to write MapReduce Jobs, without using the actual MapReduce API, to parse a large file for analysis. Even though the example merely sorts the top ten IP’s visited, its efficacy and usage is far more powerful. Nonetheless, it introduces its potential and its simplicity.
ssh -p 2222 email@example.com
cd ~ wget https://services.gradle.org/distributions/gradle-1.9-bin.zip unzip gradle-1.9-bin.zip chmod +x gradle-1.9/bin/gradle
git clone git://github.com/dmatrix/examples.git
~/gradle-1.9/bin/gradle clean jar
hdfs dfs -mkdir /user/guest/logs
hdfs dfs –mkdir /user/guest/output
hdfs dfs -copyFromLocal ./NASA_access_log_Aug95.txt /user/guest/logs
hadoop jar ./build/libs/dataprocessing.jar /user/guest/logs /user/guest/output/logs
This run should create the following output:
Once the job is submitted (or running), you can visually track its progress from the MapReduce Job Browser. Login to Ambari and click MapReduce 2. Then Use Quick Links to get to the JobHistory UI.
You can drill down on any links to explore further details about the Map Reduce jobs running in their respective YARN containers. For example, clicking on one of the job ids will show all the maps and reduces tasks created.
When the job is finished, the 10 IP addresses are written as an HDFS file part-00000. Use the Ambari HDFS Files view to navigate to the HDFS directory,
/user/guest/output/logs, and view its contents.
Voila! You have written a Cascading log processing application, executed it on the Hortonworks HDP Sandbox, and perused the respective MapReduce jobs and the output generated.
In the next tutorial, we will examine how you to use Cascading Driven to discover in-depth information on the Flow (including logical, physical, and performance views).
We hope you enjoyed the tutorial! If you’ve had any trouble completing this tutorial or require assistance, please head on over to Hortonworks Community Connection where hundreds of Hadoop experts are ready to help!