Pig Forum

# of failed Map Tasks exceeded allowed limit. FailedCount: 1.

  • #27748
    Pavan Bolla

    i have followed the instructions given on the Tutorial 2: data processing with pig-processing Base ball stats with pig.
    Pig script:
    batting = load ‘Batting.csv’ using PigStorage(‘,’);
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs BY (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    After executing i got the below error.

    # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201306181545_0002_m_000000

    2013-06-18 16:03:46,470 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats – ERROR 2106: Error executing an algebraic function
    2013-06-18 16:03:46,470 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
    2013-06-18 16:03:46,513 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features mapred 2013-06-18 16:02:06 2013-06-18 16:03:46 HASH_JOIN,GROUP_BY


    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201306181545_0002 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Error – # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201306181545_0002_m_000000

    Failed to read data from “hdfs://sandbox:8020/user/hue/Batting.csv”


    Total records written : 0
    Total bytes written : 0
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_201306181545_0002 -> null,

    2013-06-18 16:03:46,514 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Failed!
    2013-06-18 16:03:46,515 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1066: Unable to open iterator for alias join_data
    Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201306181545_0001/attempt_201306181545_0001_m_000000_0/work/pig_1371596519449.log

    Please suggest me how to proceed?

to create new topics or reply. | New User Registration

  • Author
  • #27751

    Hi Pavan,

    If you modify the script a bit, changing/adding the following, it will work:

    runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    runs = FILTER runs_raw BY runs > 0;



    Hi tedr,

    Thank you for the solution. Works for me too.



    Hi Gerald,

    Thanks for letting us know.


    Kevin Knaus

    I too had the same error. And while your reply, Ted, fixes the error, it does not really explain why the presence of a zero for “runs” record causes the failure. Also, the real issue for me was that in trying to examine and bust out the log report about the failure, which said “unable to open iterator for alias join_data”. Is there a white paper or some place else that would allow someone to understand what the implications of failing to open an iterator might imply? I assume it is a general sort of error that might appear for a variety of issues, not just the unfiltered zero field records. Thanks too for the fix you posted.

    Jianyong Dai

    Is the failed job the first one or second one? The idea is to reduce the number of mappers. For the first job, you can increase “pig.maxCombinedSplitSize” to allow each mapper take more input files. For the second job, in addition to the previous trick, reduce the reduce# of the first job will decrease number of the input part files, which also helps.

    Antonio Paternina

    of failed Map Tasks exceeded allowed limit. FailedCount: 2. LastFailedTask:

    does not work for me, but will increase the number get the same error. Please if there is any other solution

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.