Home Forums HDFS hadoop streaming

This topic contains 8 replies, has 3 voices, and was last updated by  tedr 1 year, 9 months ago.

  • Creator
    Topic
  • #12474

    elena diez
    Member

    Hi,

    I’ve been trying to run a hadoop mapreduce job using hadoop streaming because my scripts are in python. The issue comes when I try to use functions from nltk library. I read that apparently you have to zip the library and send it with -file option but that doesn’t work.

    Any ideas?

    Thank you in advance.

Viewing 8 replies - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #12550

    tedr
    Member

    Hi Elena,

    I’ll be interested in hearing how that goes when you try it.

    Thanks,
    Ted.

    Collapse
    #12548

    elena diez
    Member

    Hi Ted,

    No, I didn’t try that option but I’m still writing code so, if I need to add new modules, I will try instead of installing them manually.

    Elena

    Collapse
    #12544

    Sasha J
    Moderator

    Hi Elena,

    Good to hear that you made it work. I am curious though if you tried the -archives option.

    Thanks,
    Ted.

    Collapse
    #12539

    elena diez
    Member

    Hi,

    I tried to change the user but it didn’t work. Instead I installed manually in every node of the cluster the modules I would need to import (nltk and enchant) and it worked.

    I wish I knew how to do it with the -file option but, after trying different solutions, I have not succeeded.

    Thank you, Elena.

    Collapse
    #12518

    tedr
    Member

    Hi Elena,

    From looking at your log file I see that the reason that the task failed is an AccessControl Exception, meaning that the job was probably launched with the incorrect user. Try becoming mapred rather than hdfs when you launch the job.

    Thanks.
    Ted.

    Collapse
    #12498

    elena diez
    Member

    Hi Ted,

    Thanks for replying. I’ve uploaded two files. One is called ErrorLogs.txt with the logs and the other one is called readme.txt with the command I use from the command line to launch the job, the code of the mapper and reducer, their permissions and all.

    I hope it’s useful. Thank you very much.

    Elena.

    Collapse
    #12481

    tedr
    Member

    Hi Elena,

    There might be more information about why the Streaming job failed in the JobTracker and TaskTracker logs, could you upload the JobTracker and TaskTracker logs to:

    ftp ftp . support . hortonworks . com
    username: dropoff
    password: horton

    When we get these we’ll dig in and see if we can see why the job failed and how to fix it.

    Thanks,

    Ted.

    Collapse
    #12475

    elena diez
    Member

    By the way, the error I get is the following:
    ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201211161356_0078_m_000000

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)