Home Forums HDP on Windows – Installation Python Streaming MapReduce

This topic contains 4 replies, has 2 voices, and was last updated by  Seth Lyubich 12 months ago.

  • Creator
    Topic
  • #23178

    owen rumney
    Member

    Hi

    I’ve got a successful cluster of 3 running on 3 2012 Windows Server VM’s and the smoke test runs just fine.

    I’ve spent the day fighting with running a map reduce written with Python using the hadoop-streaming approach. I have been passing the -files in for the mapper and the reducer (mapper.py and reduce.py) and sending them into the JAR using -file but I kept getting an error that mapper.py could not be found.

    I tried passing python.exe into the JAR and setting mapper to “python.exe mapper.py”… this improves things to a certain extent but now it’s failing with a syntax error in the mapper, which I’m pretty sure is because python.exe can’t find the dependencies.

    This all seems very much to be an environment issue with python and the path. I have c:\python27 in the PATH system variable and running echo

    “foo bar foo bar” | mapper.py

    works as expected so the path feels okay.

    Has anyone experienced anything like this or have any ideas about how I can get python.

    Thanks in advance.
    Owen

Viewing 4 replies - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #23515

    Seth Lyubich
    Keymaster

    Hi Owen,

    Thanks for the update. Can you please provide the full command line that you are using?

    Thanks,
    Seth

    Collapse
    #23491

    owen rumney
    Member

    Hi Seth

    I’ve been trying to do a map reduce using python and hadoop-streaming but the problem was that the reduce.py wasn’t being copied into the job and I was only able to do local jobs for some reason.

    I chased it for a few days then finally spotted it. I was doing this,

    -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reduce.py

    when I needed to do this.

    -files “file:///d:/dev/python/mapper.py,file:///d:/dev/python/reduce.py”

    My only outstanding problem, which I’m working round at the problem, is that when I try to use -D to pass in hadoop settings I get the error -D unrecognised. When the documentation makes me thing it should be okay.

    Thanks for the response.

    Owen

    Collapse
    #23449

    Seth Lyubich
    Keymaster

    Hi Owen,

    Thanks for update. Can you please share more details on what you are trying to do?

    Thanks,
    Seth

    Collapse
    #23282

    owen rumney
    Member

    So I worked it out, I need to pass -cmdenv PYTHONPATH=c:\python27 for some reason.

    Now I just need to work out why the mapper runs but the reducer doesn’t. And how to pass files in the -files using full path instead of copying them to the working folder.

    Any thoughts?

    Thanks,
    Owen

    Collapse
Viewing 4 replies - 1 through 4 (of 4 total)