The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Windows – Installation Forum

Python Streaming MapReduce

  • #23178
    owen rumney


    I’ve got a successful cluster of 3 running on 3 2012 Windows Server VM’s and the smoke test runs just fine.

    I’ve spent the day fighting with running a map reduce written with Python using the hadoop-streaming approach. I have been passing the -files in for the mapper and the reducer ( and and sending them into the JAR using -file but I kept getting an error that could not be found.

    I tried passing python.exe into the JAR and setting mapper to “python.exe”… this improves things to a certain extent but now it’s failing with a syntax error in the mapper, which I’m pretty sure is because python.exe can’t find the dependencies.

    This all seems very much to be an environment issue with python and the path. I have c:\python27 in the PATH system variable and running echo

    “foo bar foo bar” |

    works as expected so the path feels okay.

    Has anyone experienced anything like this or have any ideas about how I can get python.

    Thanks in advance.

  • Author
  • #23282
    owen rumney

    So I worked it out, I need to pass -cmdenv PYTHONPATH=c:\python27 for some reason.

    Now I just need to work out why the mapper runs but the reducer doesn’t. And how to pass files in the -files using full path instead of copying them to the working folder.

    Any thoughts?


    Seth Lyubich

    Hi Owen,

    Thanks for update. Can you please share more details on what you are trying to do?


    owen rumney

    Hi Seth

    I’ve been trying to do a map reduce using python and hadoop-streaming but the problem was that the wasn’t being copied into the job and I was only able to do local jobs for some reason.

    I chased it for a few days then finally spotted it. I was doing this,

    -files file:///d:/dev/python/,file:///d:/dev/python/

    when I needed to do this.

    -files “file:///d:/dev/python/,file:///d:/dev/python/”

    My only outstanding problem, which I’m working round at the problem, is that when I try to use -D to pass in hadoop settings I get the error -D unrecognised. When the documentation makes me thing it should be okay.

    Thanks for the response.


    Seth Lyubich

    Hi Owen,

    Thanks for the update. Can you please provide the full command line that you are using?


The forum ‘HDP on Windows – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.