Home Forums HDP on Linux – Installation MapReduce in Sandbox

Tagged: 

This topic contains 11 replies, has 6 voices, and was last updated by   1 year, 4 months ago.

  • Creator
    Topic
  • #28816

    Duncan Gunn
    Member

    I might be missing something here, but when I try to run the standard word count MapReduce job in the sandbox, it runs successfully but the generated output is just the input file!

    I know this code works as I have verified it separately using Amazon EMR.

    I create the job in the job designer and specify the following properties:

    mapred.output.dir
    mapred.input.dir

    I point the input dir at a words.txt file.

    I would expect the output to be a count, e.g. apple 3 orange 1 and so on.

    Instead I just get the original input file back as output!

    What am I doing wrong? It’s as if the map and reduce aren’t running at all!

    Thanks

Viewing 11 replies - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #30176


    Member

    Hi Suthan,

    the example programs are in /usr/lib/hadoop

    Thanks,
    Teja

    Collapse
    #30173

    Hi,
    Could you please anyone suggest me the exact location of wordcount program.
    I logged into sandbox using root/hadoop credentials,I navigated to home directory but could nt find any usr or lib directory there.Please help

    Thanks,
    Suthan

    Collapse
    #29010

    tedr
    Moderator

    Hi Chandra,

    When I follow the steps you’ve given the output is as it should be, a count of the words in the input file.

    Thanks,
    Ted.

    Collapse
    #28996

    hi Abdelrahman,
    Please find the exact steps as below:
    1) login to shell command of sandbox with credential “root” and “hadoop”
    2) go to home directory
    cd /home
    3) make one directory dft here (you can make any directory of your choice)
    mkdir dft
    cd dft
    4) wget http://www.gutenberg.org/files/4300/4300.zip (input files to which u will count the words)
    5) unzip 4300.zip
    6) rm 4300.zip
    7) hadoop dfs -copyFromLocal /home/dft dft (copy files to hdfs)
    8)hadoop dfs -ls
    9) hadoop dfs -ls dft
    10) hadoop jar /usr/lib/hadoop/hadoop-example-1.2.0.1.3.0.0-107.jar wordcount dft dft-output
    check the output
    11)hadoop dfs -ls
    12)hadoop dfs -ls dft-output
    13)hadoop dfs -cat dft-output/part-00000 | less
    14) hadoop dfs -copyToLocal dft-output/part-00000 . (copy output to local directory dft)
    thanks

    Collapse
    #28900

    tedr
    Moderator

    HI Duncan,

    Yup, it is a bit strange that it doesn’t work there for you. I am checking to see if I get the same problem.

    Thanks,
    Ted.

    Collapse
    #28890

    Duncan Gunn
    Member

    Excellent; thanks very much it worked!

    Bit strange why it doesn’t work via the Sandbox GUI though….

    Collapse
    #28862

    tedr
    Moderator

    Hi Duncan,

    To run a hadoop job from the command line you need to either ssh into the sandbox or open a shell prompt directly in the vm. The easiest is the latter. To do this you click in the Sandox VM window and then press the key combination that it is telling you in the window (usually alt+f5). It will ask you for a username use ‘root’ then enter ‘hadoop’ as the password. You are now in a shell prompt where you can run command line stuff.

    Thanks,
    Ted.

    Collapse
    #28830

    Duncan Gunn
    Member

    This is maybe a silly question, but how do I run the command line? I’ve tried to set up a shell job in the past but that doesn’t seem to work!

    Thanks

    Collapse
    #28829

    abdelrahman
    Moderator

    Hi Duncan,

    Thank you for providing the details. Let us run a simple word count first from command line please run.
    bin/hadoop jar hadoop-*-examples.jar wordcount -m 4 -r 1
    Before this step. Create the out-dir in /tmp/output_wordcount in Hadoop by running the following command.
    > hadoop fs -mkdir /tmp/output_wordcount.
    Let me know if this works for you.

    Thanks
    -Abdelrahman

    Collapse
    #28826

    Duncan Gunn
    Member

    Hi Abdelrahman

    Good thanks! Hope you are well also.

    My exact steps (from memory) are:

    – copy my wordcount.jar to /user/hue/examples directory
    – create new MapReduce job in the job designer
    – complete the path to the wordcount.jar file
    – add two properties: mapred.output.dir and mapred.input.dir and set them as ${vars}
    – save and submit the job
    – enter the input and output parameters
    – the job seems to run fine, and is OK, but when I look at the part-00000 file that is produced, it contains exactly the same as the input file! It’s as if the job has just copied the input file.

    I’ve looked at the logs and it seems to be running through all the right steps from what I can see.

    I’m obviously doing something very wrong, but I’m lost!

    Thanks

    Collapse
    #28823

    abdelrahman
    Moderator

    Hi Duncan,

    How is your day so far? Can you please provide the exact steps that you have followed to run the word count?

    Thanks
    -Abdelrahman

    Collapse
Viewing 11 replies - 1 through 11 (of 11 total)