Pig Forum

HadoopJobHistoryLoader fails with java.lang.ArrayIndexOutOfBoundsException

  • #55922

    I’m running HDP 1.3.2 and Pig and I’m getting this exception. I’ve tried running Pig 0.12.1, which I downloaded directly from apache, and I’m getting the same error.

    Backend error message
    java.lang.ArrayIndexOutOfBoundsException: 2
    at org.apache.pig.piggybank.storage.HadoopJobHistoryLoader$HadoopJobHistoryReader.nextKeyValue(HadoopJobHistoryLoader.java:184)
    at org.apache.pig.piggybank.storage.HadoopJobHistoryLoader.getNext(HadoopJobHistoryLoader.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:530)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

    The script is below:

    REGISTER /usr/lib/pig/piggybank.jar;
    a = LOAD '/mapred/history/done'
    USING org.apache.pig.piggybank.storage.HadoopJobHistoryLoader()
    AS (j:map[], m:map[], r:map[]);
    b = GROUP a by j#'JOBNAME' PARALLEL 5;
    STORE b into '/user/nzmaprd/processed';

to create new topics or reply. | New User Registration

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.