The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Sandbox 1.3 Tutorial Error

  • #30486


    I’m running the Sandbox 1.3 on VirtualBox and I’m getting an error running the Pig scripts to process the Batting.csv file. If I run the first part ofthe script i get results:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    dump runs;

    But if I run the “full” tutorial I’m getting an error. I’m new at trying to interpret these errors. Any thoughts apreciated!

    Here is my script, it’s possible i made a typo, but I also tried cut-n-pasting from the tutorial text:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    Here is part of the error log:

    2013-07-31 11:15:41,409 [main] ERROR - ERROR 2106: Error executing an algebraic function
    2013-07-31 11:15:41,410 [main] ERROR - 1 map reduce job(s) failed!
    2013-07-31 11:15:41,423 [main] INFO - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features mapred 2013-07-31 11:14:11 2013-07-31 11:15:41 HASH_JOIN,GROUP_BY


    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307311011_0002 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307311011_0002_m_000000

    Failed to read data from "hdfs://sandbox:8020/user/hue/Batting.csv"

  • Author
  • #30506

    Hi Memeber,
    Have you verified Batting.csv is in the location /user/hue/ in the cluster?

    Kind Regards,



    As I mentioned the partial sample works just fine and read the file.

    The actual error seems to be:
    ERROR 2106: Error executing an algebraic function

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.