The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

MapReduce Forum

Creating own MapReduce word count example

  • #28971

    Hi All,
    I am new on hadoop and Horton sanbox. Now I just want to run a simple wordcount example on sandbox with my own input file. This will motivate me in learning haddop.
    Till now I have done below things.
    1) Have installed sandbox on vertual machine
    2) Able to login sandbox command line with user “root”
    My next step should be
    1)Make the input dir
    2) Take the my own input file into the sandbox from host PC.
    3)Make the output directory
    4)Then create the MApRedure for word count
    5)Run the created Mapreduce and check the output directory

    Please let me know if steps I need to be done is correct and a;so help me how can i do the above steps in sandbox

  • Author
  • #29014

    Hi Chandra,

    Your steps are mostly correct, the only step that should NOT be done is making the output directory prior to running the MapReduce job. Also to create you own MapReduce code you’ll either need to develop it out side the sandbox then move it to the sandbox to run it or use vi and command line tools in the sandbox for compiling the code.



    Hi Ted,
    Thanks for your reply.
    But could you please let me know how can i use vi to write the Mapreduce code and compile the same in command line. Could you please send the step by step procedure as I am new to hadoop once I will get these basic things then I can play with sandbox confidently.

    However yesterday I ran sample wordcount hadoop example present there in sandbox and worked fine for me.Now i want to run somthing which i have written and that only i wanted to create my own wordcount example in sandbox and wated to run the same.

    Sasha J

    There are few steps:
    1. edit you source code using “vi” text processor,
    2. compile your java code, use “javac” for it
    3. run your compiled java program, use “java” for this.
    Take a look at the wordcount sample code to get any idea of mapreduce code logic:

    Thank you!


    Hi Chandra,
    Could you please tell me the location in sandbox where I can find word-count example.
    I am not able to locate it.Please help!



    Hi Suthan,

    The compiled code for the examples is in the hadoop-examples.jar in the /usr/lib/hadoop directory. You will need to log into the sandbox shell to see this file. the source code is available on line.



    Hi Ted,
    I couldn’t find any file other than file when I logged into sandbox using credentials root/hadoop.Please help me.
    How can I access files which I have uploaded into HDP using file browser for my mapreduce projects?


    Sasha J

    All files uploaded to cluster are located in HDFS.
    Use command:

    hadoop fs -ls /

    to see the HDFS content.



    If somebody can give step-by-step instructions from running one’s own MapReduce jar file. Specifically, I am looking for help in 2 areas.
    1. How and where to copy my jar from Windows to Sandbox 1.3.
    2. How and where to copy my input file.


    How can I access files which I have uploaded into HDP using file browser ?

The forum ‘MapReduce’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.