HDFS Forum

hadoop streaming

  • #12474
    elena diez
    Member

    Hi,

    I’ve been trying to run a hadoop mapreduce job using hadoop streaming because my scripts are in python. The issue comes when I try to use functions from nltk library. I read that apparently you have to zip the library and send it with -file option but that doesn’t work.

    Any ideas?

    Thank you in advance.

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #12475
    elena diez
    Member

    By the way, the error I get is the following:
    ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201211161356_0078_m_000000

    #12481
    tedr
    Member

    Hi Elena,

    There might be more information about why the Streaming job failed in the JobTracker and TaskTracker logs, could you upload the JobTracker and TaskTracker logs to:

    ftp ftp . support . hortonworks . com
    username: dropoff
    password: horton

    When we get these we’ll dig in and see if we can see why the job failed and how to fix it.

    Thanks,

    Ted.

    #12498
    elena diez
    Member

    Hi Ted,

    Thanks for replying. I’ve uploaded two files. One is called ErrorLogs.txt with the logs and the other one is called readme.txt with the command I use from the command line to launch the job, the code of the mapper and reducer, their permissions and all.

    I hope it’s useful. Thank you very much.

    Elena.

    #12518
    tedr
    Member

    Hi Elena,

    From looking at your log file I see that the reason that the task failed is an AccessControl Exception, meaning that the job was probably launched with the incorrect user. Try becoming mapred rather than hdfs when you launch the job.

    Thanks.
    Ted.

    #12539
    elena diez
    Member

    Hi,

    I tried to change the user but it didn’t work. Instead I installed manually in every node of the cluster the modules I would need to import (nltk and enchant) and it worked.

    I wish I knew how to do it with the -file option but, after trying different solutions, I have not succeeded.

    Thank you, Elena.

    #12544
    Sasha J
    Moderator

    Hi Elena,

    Good to hear that you made it work. I am curious though if you tried the -archives option.

    Thanks,
    Ted.

    #12548
    elena diez
    Member

    Hi Ted,

    No, I didn’t try that option but I’m still writing code so, if I need to add new modules, I will try instead of installing them manually.

    Elena

    #12550
    tedr
    Member

    Hi Elena,

    I’ll be interested in hearing how that goes when you try it.

    Thanks,
    Ted.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.