Home Forums Pig Type and Division throwing error…

This topic contains 2 replies, has 2 voices, and was last updated by  Neo Kronic 2 weeks, 6 days ago.

  • Creator
    Topic
  • #54042

    Al Zyck
    Participant

    I’m using the 2.1 sandbox and trying to run the pig script below. When I do, it throws an error whenever I try to type cast an int to a double or if even if I just try to divide. The job runs for a few minutes and generates quite a few M/R success messages, then fails. I can’t figure out the issue or a work around…

    Query…
    records = LOAD ‘/user/DataSmart/chap3/GoodTweets.txt’ USING PigStorage(‘\t’) AS (tweet:chararray);
    A1 = FOREACH records GENERATE LOWER(tweet) as (a1tweet:chararray);
    A2 = FOREACH A1 GENERATE REPLACE(a1tweet, ‘\\. ‘, ‘ ‘) as (a2tweet:chararray);
    A3 = FOREACH A2 GENERATE REPLACE(a2tweet, ‘\\”‘, ”) as (a3tweet:chararray);
    A4 = FOREACH A3 GENERATE REPLACE(a3tweet, ‘\\?’, ”) as (a4tweet:chararray);
    A5 = FOREACH A4 GENERATE REPLACE(a4tweet, ‘: ‘, ‘ ‘) as (a5tweet:chararray);
    A6 = FOREACH A5 GENERATE REPLACE(a5tweet, ‘[!,;]‘, ”) as (a6tweet:chararray);
    A7 = FOREACH A6 GENERATE REPLACE(a6tweet, ‘ ‘, ‘ ‘) as (a7tweet:chararray);
    A8 = FOREACH A7 GENERATE FLATTEN(TOKENIZE((chararray)$0)) as word;
    A9 = GROUP A8 by word;
    A10 = FILTER A9 BY SIZE($0) > 3;
    A11 = FOREACH A10 GENERATE ‘same’ AS key, COUNT(A8) + 1 as (myCount:int), group as (word:chararray);
    A12 = GROUP A11 All;
    A13 = FOREACH A12 GENERATE ‘same’ AS key, COUNT(A11.word) as (totalCount:int);
    –DUMP A13;
    A14 = JOIN A13 by key, A11 by key;
    –DESCRIBE A14;
    A15 = FILTER A14 BY (A11::myCount IS NOT NULL);
    A16 = FOREACH A15 GENERATE A11::myCount as myCount, A13::totalCount as totalCount, A11::word as (word:chararray);
    –DESCRIBE A16;
    calcProb = FOREACH A16 GENERATE (double)myCount as (probability:double);–, word;
    DUMP calcProb;

    Here are the error messages…
    2014-05-20 12:55:05,077 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2014-05-20 12:55:05,078 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – job job_1400598648465_0136 has failed! Stop running all dependent jobs
    2014-05-20 12:55:05,079 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
    2014-05-20 12:55:05,795 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats – ERROR 0: Exception while executing [POCast (Name: Cast[double] – scope-105 Operator Key: scope-105) children: [[POProject (Name: Project[int][0] – scope-104 Operator Key: scope-104) children: null at []]] at []]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
    2014-05-20 12:55:05,796 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
    2014-05-20 12:55:05,801 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.4.0.2.1.1.0-385 0.12.1.2.1.1.0-385 yarn 2014-05-20 12:52:42 2014-05-20 12:55:05 HASH_JOIN,GROUP_BY,FILTER

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #56958

    Neo Kronic
    Participant

    I’m getting the same cast exception error from Long to Integer in Hive 12. I’m working on Sandbox 2.1 from hortonworks. did you find a solution ?

    Collapse
    #54045

    Al Zyck
    Participant

    Continuation of error message…

    Some jobs have failed! Stop running all dependent jobs

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1400598648465_0134 1 1 6 6 6 6 6 6 6 6 A1,A10,A11,A2,A3,A4,A5,A6,A7,A8,A9,records GROUP_BY,COMBINER
    job_1400598648465_0135 1 1 6 6 6 6 6 6 6 6 A12,A13 GROUP_BY,COMBINER

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_1400598648465_0136 A14,A15,A16,calcProb HASH_JOIN Message: Job failed! hdfs://sandbox.hortonworks.com:8020/tmp/temp-2061532534/tmp-1733357242,

    Input(s):
    Successfully read 150 records (18583 bytes) from: “/user/DataSmart/chap3/GoodTweets.txt”

    Output(s):
    Failed to produce result in “hdfs://sandbox.hortonworks.com:8020/tmp/temp-2061532534/tmp-1733357242″

    2014-05-20 12:55:12,643 [main] INFO org.apache.hadoop.ipc.Client – Retrying connect to server: sandbox.hortonworks.com/192.168.2.132:52746. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
    2014-05-20 12:55:12,751 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate – Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2014-05-20 12:55:12,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Some jobs have failed! Stop running all dependent jobs
    2014-05-20 12:55:13,069 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1066: Unable to open iterator for alias calcProb. Backend error : Exception while executing [POCast (Name: Cast[double] – scope-105 Operator Key: scope-105) children: [[POProject (Name: Project[int][0] – scope-104 Operator Key: scope-104) children: null at []]] at []]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
    Details at logfile: /hadoop/yarn/local/usercache/hue/appcache/application_1400598648465_0133/container_1400598648465_0133_01_000002/pig_1400615556922.log

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)