Thank you Ramanan, That was it,. I just needed to register the HadoopCompat.jar. Now it works! ...
Thank you Ramanan,
That was it,. I just needed to register the HadoopCompat.jar. Now it works!
Hello I wanted to use Twitters Elephant-bird, to analyze Tweets without having to save them in another format like csv and leave them in their original JSON format.
I have built Elephant-bird and I wrote the following simple code to load tweets from a file, following some examples I saw:
A = LOAD 'tweets.20131201-215958.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
tweets = FOREACH A GENERATE (CHARARRAY)$0#'id' AS id;
and I get the following error:
2013-12-19 07:55:55,364 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-12-19 07:55:55,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-12-19 07:55:55,370 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
Details at logfile: /hadoop/yarn/local/usercache/rmrodriguez/appcache/application_1387366430472_0012/container_1387366430472_0012_01_000002/pig_1387457753323.log
Anyone has experience with elephant-bird that might know the cause for the error or can suggest another way for loading tweets in JSON format?
You must be logged in to reply to this topic.