The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Sandbox 1.3 Tutorial Error

  • #30486
    Lamprey
    Member

    Hi,

    I’m running the Sandbox 1.3 on VirtualBox and I’m getting an error running the Pig scripts to process the Batting.csv file. If I run the first part ofthe script i get results:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    dump runs;

    But if I run the “full” tutorial I’m getting an error. I’m new at trying to interpret these errors. Any thoughts apreciated!

    Here is my script, it’s possible i made a typo, but I also tried cut-n-pasting from the tutorial text:

    batting = load 'Batting.csv' using PigStorage(',');
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    Here is part of the error log:

    2013-07-31 11:15:41,409 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2106: Error executing an algebraic function
    2013-07-31 11:15:41,410 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2013-07-31 11:15:41,423 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    1.2.0.1.3.0.0-107 0.11.1.1.3.0.0-107 mapred 2013-07-31 11:14:11 2013-07-31 11:15:41 HASH_JOIN,GROUP_BY

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307311011_0002 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307311011_0002_m_000000

    Input(s):
    Failed to read data from "hdfs://sandbox:8020/user/hue/Batting.csv"

  • Author
    Replies
  • #30506
    Robert
    Participant

    Hi Memeber,
    Have you verified Batting.csv is in the location /user/hue/ in the cluster?

    Kind Regards,
    Robert

    #30518
    Lamprey
    Member

    Yes.

    As I mentioned the partial sample works just fine and read the file.

    The actual error seems to be:
    ERROR 2106: Error executing an algebraic function

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.