Krishna, thank you very much for sharing your experience! Sasha ...
thank you very much for sharing your experience!
I am trying to write a pig script in Hortonworks Sandbox 1.3 that will read a file from hdfs and store it in HBase table. I have it successfully working in local mode, but getting error when I launch pig in Psudo-distributed mode in the sandbox:
Here is the script:
raw_data = LOAD ‘/user/krishna/data/pig/customers.csv’ USING PigStorage(‘,’) AS (cust_id:chararray, fname:chararray, lname:chararray, city:chararray, obese:chararray, terminal_patient:chararray );
STORE raw_data INTO ‘hbase://customer’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(‘demographic:first_name demographic:last_name demographic:city health:obese health:terminal_patient’);
I get the following error:
2013-07-24 08:54:40,660 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – job job_201307240711_0003 has failed! Stop running all dependent jobs
2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
2013-07-24 08:54:40,665 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
2013-07-24 08:54:40,666 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
184.108.40.206.3.0.0-107 0.11.1.1.3.0.0-107 root 2013-07-24 08:53:46 2013-07-24 08:54:40 UNKNOWN
JobId Alias Feature Message Outputs
job_201307240711_0003 raw_data MAP_ONLY Message: Job failed! Error – JobCleanup Task Failure, Task: task_201307240711_0003_m_000001 hbase://customer,
Failed to read data from “/user/root/customers.csv”
Failed to produce result in “hbase://customer”
The file does exist in HDFS, I have had the pig script successfully read and dumped to another hdfs file/console.
The problem is when I try to STORE it to HBase .. that’s when Pig doesn’t seem to know how to read from Hadoop HDFS.
I had to set correct HADOOP_HOME and HBASE_HOME environmental variables before launching pig to be able to run the grunt commands above.
Any help is appreciated. Want to know if this is a bug in the Sandbox 1.3 environment.