If you have any errors in completing this tutorial. Please ask questions or notify us on Hortonworks Community Connection!
To start this tutorial, you must do two things: First, download the Sandbox and follow the installation instructions. Second, download the Cascading SDK.
The example WordCount is derived from part 2 of the Cascading Impatient Series.
su cascade to login as cascade user
chmod +x gradle-1.9/bin/gradle
git clone git://github.com/Cascading/Impatient.git
~/gradle-1.9/bin/gradle clean jar(this builds the impatient.jar file, which is your wordcount unit of execution)
Now you’re ready to run and deploy your impatient.jar file onto the cluster.
cd /home/cascade/Impatient/part2 hadoop fs -mkdir -p /user/cascade/data/ hadoop fs -copyFromLocal data/rain.txt /user/cascade/data/ hadoop jar ./build/libs/impatient.jar data/rain.txt output/wc
This command will produce the following output:
Once the job is submitted (or running) you can actually track its progress from the Sandbox MapReduce Job Browser. Click on Job History UI.
By default, it will display all jobs run by the user. Look for the latest one which should have a user cascade.
When the job is finished, the word counts are written as an HDFS file part-00000. Use the Sandbox’s HDFS Files view to navigate to the HDFS directory and view its contents.
For the adventurous, you can try the entire Impatient Series, after you have downloaded the sources from the github. Beyond the Impatient series, there’re other tutorials and case examples to play with.
We hope you enjoyed the tutorial! If you’ve had any trouble completing this tutorial or require assistance, please head on over to Hortonworks Community Connection where hundreds of Hadoop experts are ready to help!