The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Error while running sand box tutorial for pig script

  • #33124

    Hi Folks,

    I am getting below error while executing pig script from sand box tutorial.

    # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

    can someone help to proceed.

  • Author
  • #33259
    Akki Sharma

    Hello Krishna,

    In your mapped-site.xml file, please check what is the value of the property “mapred.job.reuse.jvm.num.tasks”.

    It should be 1. The property should look like:


    and run the script again.

    Best Regards,


    Hi Krishna,

    Which tutorial are you hitting an issue on?




    thanks Dave and Sharma for response.

    Sharma, I could see value as 1 for the property you mentioned.

    Dave, I am trying baseball statistics in tutorial2.


    Hi Krishna,
    Is the problem consistent? Can you try running the pig script multiple times to verify it reproduces? If so, please provide the virtual application you are using and the operating system?



    Hi Robert,

    Below is the code copied from tutorial.

    batting = load ‘Batting.csv’ using PigStorage(‘,’);
    runs = FOREACH batting GENERATE $0 as playerID,$1 as year,$8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp,MAX (runs.runs) as max_runs;
    dump max_runs;

    but it is giving below errors:

    2013-09-07 07:04:04,968 [main] ERROR – ERROR 2106: Error executing an algebraic function
    2013-09-07 07:04:04,970 [main] ERROR – 1 map reduce job(s) failed!
    2013-09-07 07:04:04,987 [main] INFO – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features mapred 2013-09-07 07:02:20 2013-09-07 07:04:04 GROUP_BY


    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201309070520_0002 batting,grp_data,max_runs,runs GROUP_BY,COMBINER Message: Job failed! Error – # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

    After googling, I realized the fix as by adding additional statement highlighted below:

    batting = LOAD ‘Batting.csv’ using PigStorage(‘,’);
    runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    runs = FILTER runs_raw BY runs > 0;
    grp_data = group runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    dump max_runs;

    so what is the difference?


    Hi Krishna,

    This is because Batting.csv has column names at the top which can’t be used as an algebraic function.
    If you were to remove the first line and remove the filter, then it would work.




    Hello Dave,
    your suggestion works. Would appreciate if you could provide more explanation on why the column names is not valid and needs to be removed.

    In other words, can we rename the column names to make it work instead of removing the header row.


    Hi Ravi,

    No, renaming the columns will not work.
    This is because you are parsing them into a algebraic function and they are not numeric.
    This is why applying the filter works and would be best practice



The topic ‘Error while running sand box tutorial for pig script’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.