The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Pig Forum

Error reading file from hdfs and storing in hbase using Pig

  • #29992

    I am trying to write a pig script in Hortonworks Sandbox 1.3 that will read a file from hdfs and store it in HBase table. I have it successfully working in local mode, but getting error when I launch pig in Psudo-distributed mode in the sandbox:

    Here is the script:
    raw_data = LOAD ‘/user/krishna/data/pig/customers.csv’ USING PigStorage(‘,’) AS (cust_id:chararray, fname:chararray, lname:chararray, city:chararray, obese:chararray, terminal_patient:chararray );

    STORE raw_data INTO ‘hbase://customer’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(‘demographic:first_name demographic:last_name demographic:city health:obese health:terminal_patient’);

    I get the following error:
    2013-07-24 08:54:40,660 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – job job_201307240711_0003 has failed! Stop running all dependent jobs
    2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
    2013-07-24 08:54:40,665 [main] ERROR – 1 map reduce job(s) failed!
    2013-07-24 08:54:40,666 [main] INFO – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features root 2013-07-24 08:53:46 2013-07-24 08:54:40 UNKNOWN


    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307240711_0003 raw_data MAP_ONLY Message: Job failed! Error – JobCleanup Task Failure, Task: task_201307240711_0003_m_000001 hbase://customer,

    Failed to read data from “/user/root/customers.csv”

    Failed to produce result in “hbase://customer”

    The file does exist in HDFS, I have had the pig script successfully read and dumped to another hdfs file/console.
    The problem is when I try to STORE it to HBase .. that’s when Pig doesn’t seem to know how to read from Hadoop HDFS.
    I had to set correct HADOOP_HOME and HBASE_HOME environmental variables before launching pig to be able to run the grunt commands above.

    Any help is appreciated. Want to know if this is a bug in the Sandbox 1.3 environment.

  • Author
  • #29999

    HI Krishna,

    this looks like it could be due to permissions, what user is the script being run as? does that user have permission to read the file?



    Hi Ted,
    I uploaded the file to HDFS using Hue GUI. I see that file belongs to User: “hue” and Group: “hdfs” with following permission (-rwxr-xr-x).

    I am SSH’ed as “root” before launching Pig for the Grunt shell. Do you suggest launching the Pig script as “hue” to make sure it has access to the hdfs file. Is there a way to assign permissions to files in HDFS?



    By the way .. If I change the script STORE command to write to HDFS using PigStorage instead of HBaseStorage everything works fine. So, strangely having the HBase STORE command makes Pig not being able to read the HDFS file.



    I was finally able to resolve the problem after spending countless hours trying out various things and googling related issues other have faced. Here is the solution for anyone who has faced similar problem:

    In HDP 1.3 Sandbox before venturing into Pig development that uses HBaseStorage and other serious HBase development, you have to do following to be able to read and write to HBase tables successfully:

    1) Edit /usr/lib/pig/conf/
    Add: export HBASE_HOME=/usr/lib/hbase

    2) Edit /usr/lib/hadoop/
    for f in $HBASE_HOME/lib/*.jar; do

    Reboot the VM so that new classpath gets effective for all the MR nodes
    $shutdown -r now

    3) Edit /usr/lib/hbase/conf/hbase-site.xml

    Edit the value of property “zookeeper.znode.parent”
    from “/hbase-unsecure” to “/hbase”
    Without this change you would not be able to retrieve MasterServer from Zookeper connection.
    I am using HareDB-HBase client to visalize my HBase tables and data and he would not connect to the hbase server without making this change, I think this is some kind of compatability issue between the various components.

    You have to be careful that you should first stop HBase from Ambari, then make the edit in hbase-site.xml and then manually start the hbase daemons ( Every time you use Ambari, the value in hbase-site will be over-ridden and client connection will not work.

    hope this helps somebody avoid the frustrations I had to go through :)


    Sasha J

    thank you very much for sharing your experience!


The forum ‘Pig’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.