The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Null Pointer Exception after ILLUSTRATE

  • #53346
    PaaKow Acquah
    Participant

    I’m using the Hortonworks Sandbox 2.0 on Virtualbox Hue Pig Shell. The pig version is 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013.

    I Load data from a text file that contains some lsof output:

    grunt> rawlog = LOAD '/user/hue/lsoftwo.log' as (COMMAND:chararray, PID:int, USER:chararray, FD:chararray, TYPE:chararray, DEVICE:chararray, SIZEOFF:chararray, NODE:chararray, N:chararray);
    rawlog = LOAD '/user/hue/lsoftwo.log' as (COMMAND:chararray, PID:int, USER:chararray, FD:chararray, TYPE:chararray, DEVICE:chararray, SIZEOFF:chararray, NODE:chararray, N:chararray);

    I am then able to invoke DUMP, GROUP, STORE, etc.

    grunt> DUMP rawlog;
    DUMP rawlog;
    ...
    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.2.0.2.0.6.0-76 0.12.0.2.0.6.0-76 hue 2014-05-07 13:01:44 2014-05-07 13:02:34 UNKNOWN

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1399299137074_0002 1 0 14 14 14 14 n/a n/a n/a n/a rawlog MAP_ONLY hdfs://sandbox.hortonworks.com:8020/tmp/temp-513851403/tmp-1694870000,

    Input(s):
    Successfully read 1801 records (211911 bytes) from: "/user/hue/lsoftwo.log"

    Output(s):
    Successfully stored 1801 records (236750 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp-513851403/tmp-1694870000"

    Counters:
    Total records written : 1801
    Total bytes written : 236750
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1399299137074_0002

    2014-05-07 13:02:34,500 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 14408 time(s).
    2014-05-07 13:02:34,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2014-05-07 13:02:34,512 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-07 13:02:34,513 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2014-05-07 13:02:34,540 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2014-05-07 13:02:34,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (gnome-ses 1977 pacquah cwd DIR 8,2 4096 1572866 /home/pacquah,,,,,,,,)
    (gnome-ses 1977 pacquah rtd DIR 8,2 4096 2 /,,,,,,,,)
    (gnome-ses 1977 pacquah txt REG 8,2 248192 22413700 /usr/bin/gnome-session,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 26776 22420839 /usr/lib/x86_64-linux-gnu/libogg.so.0.7.1,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 178448 22420989 /usr/lib/x86_64-linux-gnu/libvorbis.so.0.4.5,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 39184 22420796 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 67992 22420964 /usr/lib/x86_64-linux-gnu/libtdb.so.1.2.9,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 31040 22420993 /usr/lib/x86_64-linux-gnu/libvorbisfile.so.3.3.4,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 64288 22420522 /usr/lib/x86_64-linux-gnu/libcanberra.so.0.2.5,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 18784 22420520 /usr/lib/x86_64-linux-gnu/libcanberra-gtk3.so.0.1.8,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 23072 22677901 /usr/lib/x86_64-linux-gnu/gtk-3.0/modules/libcanberra-gtk3-module.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 23088 22677717 /usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 112012 22812407 /usr/share/mime/mime.cache,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 60104 22417725 /usr/lib/gtk-3.0/3.0.0/theming-engines/libunico.so,,,,,,,,)
    (gnome-ses 1977 pacquah mem REG 8,2 51080 27525394 /lib/x86_64-linux-gnu/libudev.so.0.13.0,,,,,,,,)
    ...

    If, however, I invoke ILLUSTRATE, I get an error:

    grunt> groupedByPid = GROUP rawlog BY PID;
    groupedByPid = GROUP rawlog BY PID;
    grunt> ILLUSTRATE groupedByPid;
    ILLUSTRATE groupedByPid;
    ...
    2014-05-07 13:26:22,636 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: rawlog[1,9],rawlog[-1,-1],groupedByPid[4,15] C: R:
    2014-05-07 13:26:22,641 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2014-05-07 13:26:22,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: rawlog[1,9],rawlog[-1,-1],groupedByPid[4,15] C: R:
    2014-05-07 13:26:22,656 [main] ERROR org.apache.pig.pen.AugmentBaseDataVisitor - No (valid) input data found!
    java.lang.RuntimeException: No (valid) input data found!
    at org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:585)
    at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:230)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:82)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:84)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:66)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:180)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1238)
    at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:831)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:802)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:381)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:541)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    2014-05-07 13:26:22,660 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
    Details at logfile: /dev/null

    And I am no longer able to LOAD, DUMP, GROUP, or STORE without a null pointer exception:

    Details at logfile: /dev/null
    grunt> dump groupedByPid;
    dump groupedByPid;
    2014-05-07 13:27:28,464 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
    2014-05-07 13:27:28,466 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[DuplicateForEachColumnRewrite, ImplicitSplitInserter, LoadTypeCastInserter, NewPartitionFilterOptimizer, StreamTypeCastInserter], RULES_DISABLED=[AddForEach, ColumnMapKeyPrune, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, LimitOptimizer, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter]}
    2014-05-07 13:27:28,478 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2014-05-07 13:27:28,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2014-05-07 13:27:28,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2014-05-07 13:27:28,510 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
    2014-05-07 13:27:28,513 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2014-05-07 13:27:28,515 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2014-05-07 13:27:28,516 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2014-05-07 13:27:28,516 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2014-05-07 13:27:28,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=211536
    2014-05-07 13:27:28,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2014-05-07 13:27:28,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2014-05-07 13:27:28,565 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
    2014-05-07 13:27:28,565 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
    2014-05-07 13:27:28,566 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
    2014-05-07 13:27:28,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2014-05-07 13:27:28,588 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
    2014-05-07 13:27:28,645 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
    java.lang.NullPointerException
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
    2014-05-07 13:27:29,077 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2014-05-07 13:27:34,102 [main] WARN org.apache.pig.tools.pigstats.JobStats - unable to get stores of the job
    2014-05-07 13:27:34,103 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2014-05-07 13:27:34,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
    2014-05-07 13:27:34,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2014-05-07 13:27:34,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: null
    2014-05-07 13:27:34,132 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
    2014-05-07 13:27:34,132 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.2.0.2.0.6.0-76 0.12.0.2.0.6.0-76 hue 2014-05-07 13:27:28 2014-05-07 13:27:34 GROUP_BY

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    N/A groupedByPid,rawlog GROUP_BY Message: java.lang.NullPointerException
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)

    Input(s):
    Failed to read data from "/user/hue/lsoftwo.log"

    Output(s):

    Counters:
    Total records written : 0
    Total bytes written : 0
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    null

    2014-05-07 13:27:34,132 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
    2014-05-07 13:27:34,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias groupedByPid
    Details at logfile: /dev/null

    Until I reboot the VM. The input file remains unchanged. Am I doing something wrong?

  • Author
    Replies
  • #53585
    iandr413
    Moderator

    Hi PaaKow,
    I would try dumping groupedByPid data prior to illustrate to make sure there is nothing wrong with that data structure. I do not believe the issue is with illustrate as you cannot operate on groupedByPid at all. You could even try running illustrate against rawlog. I just ran a quick test on my instance and did not have any issues using similar operations. I hope this helps.

    Ian

    #53603
    PaaKow Acquah
    Participant

    Thanks for your reply!

    I was able to dump both rawlog and groups on several of its fields. The NullPointerException occurs after the first invocation of illustrate, whether it’s invoked on rawlog or a group.

    I tried again with the sandbox 2.1, and see the same result.

    Did you change any configuration before your attempt?

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.