Home Forums Hortonworks Sandbox Sandbox 1.3 Tutorial Error

This topic contains 2 replies, has 2 voices, and was last updated by  Lamprey 1 year, 1 month ago.

  • Creator
    Topic
  • #30486

    Lamprey
    Member

    Hi,

    I’m running the Sandbox 1.3 on VirtualBox and I’m getting an error running the Pig scripts to process the Batting.csv file. If I run the first part ofthe script i get results:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    dump runs;

    But if I run the “full” tutorial I’m getting an error. I’m new at trying to interpret these errors. Any thoughts apreciated!

    Here is my script, it’s possible i made a typo, but I also tried cut-n-pasting from the tutorial text:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    Here is part of the error log:

    2013-07-31 11:15:41,409 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2106: Error executing an algebraic function
    2013-07-31 11:15:41,410 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2013-07-31 11:15:41,423 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    1.2.0.1.3.0.0-107 0.11.1.1.3.0.0-107 mapred 2013-07-31 11:14:11 2013-07-31 11:15:41 HASH_JOIN,GROUP_BY

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307311011_0002 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307311011_0002_m_000000

    Input(s):
    Failed to read data from "hdfs://sandbox:8020/user/hue/Batting.csv"

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #30518

    Lamprey
    Member

    Yes.

    As I mentioned the partial sample works just fine and read the file.

    The actual error seems to be:
    ERROR 2106: Error executing an algebraic function

    Collapse
    #30506

    Robert
    Participant

    Hi Memeber,
    Have you verified Batting.csv is in the location /user/hue/ in the cluster?

    Kind Regards,
    Robert

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)