hadoop – Map reduce on multiple cluster

to create new topics or reply. | New User Registration

Tagged: ,

This topic contains 2 replies, has 2 voices, and was last updated by  Sorna Lingam 1 year, 4 months ago.

  • Creator
  • #43617

    Sorna Lingam

    I have configured Hadoop cluster . And im having two machines DEV140 and DEV144 When i run the mapreduce program using the following code

    hadoop jar /HDP/hadoop- -mapper “python C:\Python33\mapper.py” -reducer “python C:\Python33\redu.py” -input “/user/sornalingam/input/input.txt” -output “/user/sornalingam/output/out20131112_09″

    where : mapper – C:\Python33\mapper.py and reducer C:\Python33\redu.py is in DEV144’s local disk

    The Mapreduce job is performed only in machine DEV144 but not in DEV140 I have sufred for it But i could not find any resource. Kindly help me soon

    How can i run the mapreduce to use both machines that is in multiple clusters

    for more detail refer this link




Viewing 2 replies - 1 through 2 (of 2 total)

You must be to reply to this topic. | Create Account

  • Author
  • #43670

    Sorna Lingam

    Hi Seth Lyubich

    i traced down the error log and found this

    In machine DEV140 : python: can’t open file ‘C:\Python33\mapper.py': [Errno 2] No such file or directory

    Im getting this error log

    Actually im having my Map and reduce program in my DEV144 : machine local drive

    Now how can i resolve

    1 .Do i need to have my map and reduce program in all the cluster ?
    2 . How can i solve this ?



    Seth Lyubich

    Hi Sorna,

    I believe this was addressed on http://stackoverflow.com/questions/19928671/hadoop-map-reduce-on-multiple-cluster . In addition JobTracker will manage where the task is being processed. If both TaskTrackers are up and configured with slots the tasks might get scheduled based on data locacality.

    Hope this helps,


Viewing 2 replies - 1 through 2 (of 2 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.