HDP on Windows – Installation Forum

hadoop – Map reduce on multiple cluster

  • #43617
    Sorna Lingam

    I have configured Hadoop cluster . And im having two machines DEV140 and DEV144 When i run the mapreduce program using the following code

    hadoop jar /HDP/hadoop- -mapper “python C:\Python33\mapper.py” -reducer “python C:\Python33\redu.py” -input “/user/sornalingam/input/input.txt” -output “/user/sornalingam/output/out20131112_09”

    where : mapper – C:\Python33\mapper.py and reducer C:\Python33\redu.py is in DEV144’s local disk

    The Mapreduce job is performed only in machine DEV144 but not in DEV140 I have sufred for it But i could not find any resource. Kindly help me soon

    How can i run the mapreduce to use both machines that is in multiple clusters

    for more detail refer this link




to create new topics or reply. | New User Registration

  • Author
  • #43648
    Seth Lyubich

    Hi Sorna,

    I believe this was addressed on http://stackoverflow.com/questions/19928671/hadoop-map-reduce-on-multiple-cluster . In addition JobTracker will manage where the task is being processed. If both TaskTrackers are up and configured with slots the tasks might get scheduled based on data locacality.

    Hope this helps,


    Sorna Lingam

    Hi Seth Lyubich

    i traced down the error log and found this

    In machine DEV140 : python: can’t open file ‘C:\Python33\mapper.py’: [Errno 2] No such file or directory

    Im getting this error log

    Actually im having my Map and reduce program in my DEV144 : machine local drive

    Now how can i resolve

    1 .Do i need to have my map and reduce program in all the cluster ?
    2 . How can i solve this ?


You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.