Hortonworks Sandbox Forum

Sandbox 1.3 Tutorial Error

  • #30486


    I’m running the Sandbox 1.3 on VirtualBox and I’m getting an error running the Pig scripts to process the Batting.csv file. If I run the first part ofthe script i get results:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    dump runs;

    But if I run the “full” tutorial I’m getting an error. I’m new at trying to interpret these errors. Any thoughts apreciated!

    Here is my script, it’s possible i made a typo, but I also tried cut-n-pasting from the tutorial text:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    Here is part of the error log:

    2013-07-31 11:15:41,409 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2106: Error executing an algebraic function
    2013-07-31 11:15:41,410 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2013-07-31 11:15:41,423 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features mapred 2013-07-31 11:14:11 2013-07-31 11:15:41 HASH_JOIN,GROUP_BY


    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307311011_0002 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307311011_0002_m_000000

    Failed to read data from "hdfs://sandbox:8020/user/hue/Batting.csv"

to create new topics or reply. | New User Registration

  • Author
  • #30506

    Hi Memeber,
    Have you verified Batting.csv is in the location /user/hue/ in the cluster?

    Kind Regards,



    As I mentioned the partial sample works just fine and read the file.

    The actual error seems to be:
    ERROR 2106: Error executing an algebraic function

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.