Home Forums Hortonworks Sandbox Null Pointer Exception after ILLUSTRATE

This topic contains 2 replies, has 2 voices, and was last updated by  PaaKow Acquah 5 months, 1 week ago.

  • Creator
    Topic
  • #53346

    PaaKow Acquah
    Participant

    I’m using the Hortonworks Sandbox 2.0 on Virtualbox Hue Pig Shell. The pig version is 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013.

    I Load data from a text file that contains some lsof output:

    grunt> rawlog = LOAD '/user/hue/lsoftwo.log' as (COMMAND:chararray, PID:int, USER:chararray, FD:chararray, TYPE:chararray, DEVICE:chararray, SIZEOFF:chararray, NODE:chararray, N:chararray);
    rawlog = LOAD '/user/hue/lsoftwo.log' as (COMMAND:chararray, PID:int, USER:chararray, FD:chararray, TYPE:chararray, DEVICE:chararray, SIZEOFF:chararray, NODE:chararray, N:chararray);

    I am then able to invoke DUMP, GROUP, STORE, etc.

    grunt> DUMP rawlog;
    DUMP rawlog;
    ...
    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.2.0.2.0.6.0-76 0.12.0.2.0.6.0-76 hue 2014-05-07 13:01:44 2014-05-07 13:02:34 UNKNOWN

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1399299137074_0002 1 0 14 14 14 14 n/a n/a n/a n/a rawlog MAP_ONLY hdfs://sandbox.hortonworks.com:8020/tmp/temp-513851403/tmp-1694870000,

    Input(s):
    Successfully read 1801 records (211911 bytes) from: "/user/hue/lsoftwo.log"

    Output(s):
    Successfully stored 1801 records (236750 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp-513851403/tmp-1694870000"

    Counters:
    Total records written : 1801
    Total bytes written : 236750
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1399299137074_0002

    2014-05-07 13:02:34,500 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 14408 time(s).
    2014-05-07 13:02:34,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2014-05-07 13:02:34,512 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-07 13:02:34,513 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2014-05-07 13:02:34,540 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2014-05-07 13:02:34,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (gnome-ses 1977 pacquah cwd DIR 8,2 4096 1572866 /home/pacquah,,,,,,,,)
    (gnome-ses 1977 pacquah rtd DIR 8,2 4096 2 /,,,,,,,,)
    (gnome-ses 1977 pacquah txt REG 8,2 248192 22413700 /usr/bin/gnome-session,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 26776 22420839 /usr/lib/x86_64-linux-gnu/libogg.so.0.7.1,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 178448 22420989 /usr/lib/x86_64-linux-gnu/libvorbis.so.0.4.5,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 39184 22420796 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 67992 22420964 /usr/lib/x86_64-linux-gnu/libtdb.so.1.2.9,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 31040 22420993 /usr/lib/x86_64-linux-gnu/libvorbisfile.so.3.3.4,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 64288 22420522 /usr/lib/x86_64-linux-gnu/libcanberra.so.0.2.5,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 18784 22420520 /usr/lib/x86_64-linux-gnu/libcanberra-gtk3.so.0.1.8,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 23072 22677901 /usr/lib/x86_64-linux-gnu/gtk-3.0/modules/libcanberra-gtk3-module.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 23088 22677717 /usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 112012 22812407 /usr/share/mime/mime.cache,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 60104 22417725 /usr/lib/gtk-3.0/3.0.0/theming-engines/libunico.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 51080 27525394 /lib/x86_64-linux-gnu/libudev.so.0.13.0,,,,,,,,)
    ...

    If, however, I invoke ILLUSTRATE, I get an error:

    grunt> groupedByPid = GROUP rawlog BY PID;
    groupedByPid = GROUP rawlog BY PID;
    grunt> ILLUSTRATE groupedByPid;
    ILLUSTRATE groupedByPid;
    ...
    2014-05-07 13:26:22,636 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: rawlog[1,9],rawlog[-1,-1],groupedByPid[4,15] C: R:
    2014-05-07 13:26:22,641 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2014-05-07 13:26:22,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: rawlog[1,9],rawlog[-1,-1],groupedByPid[4,15] C: R:
    2014-05-07 13:26:22,656 [main] ERROR org.apache.pig.pen.AugmentBaseDataVisitor - No (valid) input data found!
    java.lang.RuntimeException: No (valid) input data found!
    at org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:585)
    at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:230)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:82)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:84)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:66)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:180)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1238)
    at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:831)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:802)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:381)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:541)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    2014-05-07 13:26:22,660 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
    Details at logfile: /dev/null

    And I am no longer able to LOAD, DUMP, GROUP, or STORE without a null pointer exception:

    Details at logfile: /dev/null
    grunt> dump groupedByPid;
    dump groupedByPid;
    2014-05-07 13:27:28,464 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
    2014-05-07 13:27:28,466 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[DuplicateForEachColumnRewrite, ImplicitSplitInserter, LoadTypeCastInserter, NewPartitionFilterOptimizer, StreamTypeCastInserter], RULES_DISABLED=[AddForEach, ColumnMapKeyPrune, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, LimitOptimizer, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter]}
    2014-05-07 13:27:28,478 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2014-05-07 13:27:28,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2014-05-07 13:27:28,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2014-05-07 13:27:28,510 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
    2014-05-07 13:27:28,513 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2014-05-07 13:27:28,515 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2014-05-07 13:27:28,516 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2014-05-07 13:27:28,516 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2014-05-07 13:27:28,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=211536
    2014-05-07 13:27:28,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2014-05-07 13:27:28,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2014-05-07 13:27:28,565 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
    2014-05-07 13:27:28,565 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
    2014-05-07 13:27:28,566 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
    2014-05-07 13:27:28,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2014-05-07 13:27:28,588 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
    2014-05-07 13:27:28,645 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
    java.lang.NullPointerException
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
    2014-05-07 13:27:29,077 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2014-05-07 13:27:34,102 [main] WARN org.apache.pig.tools.pigstats.JobStats - unable to get stores of the job
    2014-05-07 13:27:34,103 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2014-05-07 13:27:34,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
    2014-05-07 13:27:34,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2014-05-07 13:27:34,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: null
    2014-05-07 13:27:34,132 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2014-05-07 13:27:34,132 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.2.0.2.0.6.0-76 0.12.0.2.0.6.0-76 hue 2014-05-07 13:27:28 2014-05-07 13:27:34 GROUP_BY

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    N/A groupedByPid,rawlog GROUP_BY Message: java.lang.NullPointerException
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)

    Input(s):
    Failed to read data from "/user/hue/lsoftwo.log"

    Output(s):

    Counters:
    Total records written : 0
    Total bytes written : 0
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    null

    2014-05-07 13:27:34,132 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
    2014-05-07 13:27:34,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias groupedByPid
    Details at logfile: /dev/null

    Until I reboot the VM. The input file remains unchanged. Am I doing something wrong?

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #53603

    PaaKow Acquah
    Participant

    Thanks for your reply!

    I was able to dump both rawlog and groups on several of its fields. The NullPointerException occurs after the first invocation of illustrate, whether it’s invoked on rawlog or a group.

    I tried again with the sandbox 2.1, and see the same result.

    Did you change any configuration before your attempt?

    Collapse
    #53585

    iandr413
    Moderator

    Hi PaaKow,
    I would try dumping groupedByPid data prior to illustrate to make sure there is nothing wrong with that data structure. I do not believe the issue is with illustrate as you cannot operate on groupedByPid at all. You could even try running illustrate against rawlog. I just ran a quick test on my instance and did not have any issues using similar operations. I hope this helps.

    Ian

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)