Home Forums Hortonworks Sandbox Error while running sand box tutorial for pig script

This topic contains 8 replies, has 5 voices, and was last updated by  Dave 10 months, 3 weeks ago.

  • Creator
    Topic
  • #33124

    Hi Folks,

    I am getting below error while executing pig script from sand box tutorial.

    # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

    can someone help to proceed.

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Error while running sand box tutorial for pig script’ is closed to new replies.

  • Author
    Replies
  • #40315

    Dave
    Moderator

    Hi Ravi,

    No, renaming the columns will not work.
    This is because you are parsing them into a algebraic function and they are not numeric.
    This is why applying the filter works and would be best practice

    Thanks

    Dave

    Collapse
    #40177

    Hello Dave,
    your suggestion works. Would appreciate if you could provide more explanation on why the column names is not valid and needs to be removed.

    In other words, can we rename the column names to make it work instead of removing the header row.

    Collapse
    #39166

    Dave
    Moderator

    Hi Krishna,

    This is because Batting.csv has column names at the top which can’t be used as an algebraic function.
    If you were to remove the first line and remove the filter, then it would work.

    Thanks

    Dave

    Collapse
    #34719

    Hi Robert,

    Below is the code copied from tutorial.

    batting = load ‘Batting.csv’ using PigStorage(‘,’);
    runs = FOREACH batting GENERATE $0 as playerID,$1 as year,$8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp,MAX (runs.runs) as max_runs;
    dump max_runs;

    but it is giving below errors:

    2013-09-07 07:04:04,968 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats – ERROR 2106: Error executing an algebraic function
    2013-09-07 07:04:04,970 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
    2013-09-07 07:04:04,987 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    1.2.0.1.3.0.0-107 0.11.1.1.3.0.0-107 mapred 2013-09-07 07:02:20 2013-09-07 07:04:04 GROUP_BY

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201309070520_0002 batting,grp_data,max_runs,runs GROUP_BY,COMBINER Message: Job failed! Error – # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

    After googling, I realized the fix as by adding additional statement highlighted below:

    batting = LOAD ‘Batting.csv’ using PigStorage(‘,’);
    runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    runs = FILTER runs_raw BY runs > 0;
    grp_data = group runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    dump max_runs;

    so what is the difference?

    Collapse
    #34151

    Robert
    Participant

    Hi Krishna,
    Is the problem consistent? Can you try running the pig script multiple times to verify it reproduces? If so, please provide the virtual application you are using and the operating system?

    Regards,
    Robert

    Collapse
    #33860

    thanks Dave and Sharma for response.

    Sharma, I could see value as 1 for the property you mentioned.

    Dave, I am trying baseball statistics in tutorial2.

    Collapse
    #33691

    Dave
    Moderator

    Hi Krishna,

    Which tutorial are you hitting an issue on?

    Thanks

    Dave

    Collapse
    #33259

    Akki Sharma
    Moderator

    Hello Krishna,

    In your mapped-site.xml file, please check what is the value of the property “mapred.job.reuse.jvm.num.tasks”.

    It should be 1. The property should look like:

    mapred.job.reuse.jvm.num.tasks
    1

    and run the script again.

    Best Regards,
    Akki

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)