Home Forums Pig Error reading file from hdfs and storing in hbase using Pig

This topic contains 5 replies, has 3 voices, and was last updated by  Sasha J 1 year, 1 month ago.

  • Creator
    Topic
  • #29992

    Hi,
    I am trying to write a pig script in Hortonworks Sandbox 1.3 that will read a file from hdfs and store it in HBase table. I have it successfully working in local mode, but getting error when I launch pig in Psudo-distributed mode in the sandbox:

    Here is the script:
    —————————————————————————————————————————————
    raw_data = LOAD ‘/user/krishna/data/pig/customers.csv’ USING PigStorage(‘,’) AS (cust_id:chararray, fname:chararray, lname:chararray, city:chararray, obese:chararray, terminal_patient:chararray );

    STORE raw_data INTO ‘hbase://customer’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(‘demographic:first_name demographic:last_name demographic:city health:obese health:terminal_patient’);
    —————————————————————————————————————————————

    I get the following error:
    —————————————————————————————————————————————
    2013-07-24 08:54:40,660 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – job job_201307240711_0003 has failed! Stop running all dependent jobs
    2013-07-24 08:54:40,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
    2013-07-24 08:54:40,665 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
    2013-07-24 08:54:40,666 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    1.2.0.1.3.0.0-107 0.11.1.1.3.0.0-107 root 2013-07-24 08:53:46 2013-07-24 08:54:40 UNKNOWN

    Failed!

    Failed Jobs:
    JobId Alias Feature Message Outputs
    job_201307240711_0003 raw_data MAP_ONLY Message: Job failed! Error – JobCleanup Task Failure, Task: task_201307240711_0003_m_000001 hbase://customer,

    Input(s):
    Failed to read data from “/user/root/customers.csv”

    Output(s):
    Failed to produce result in “hbase://customer”
    —————————————————————————————————————————————

    The file does exist in HDFS, I have had the pig script successfully read and dumped to another hdfs file/console.
    The problem is when I try to STORE it to HBase .. that’s when Pig doesn’t seem to know how to read from Hadoop HDFS.
    I had to set correct HADOOP_HOME and HBASE_HOME environmental variables before launching pig to be able to run the grunt commands above.

    Any help is appreciated. Want to know if this is a bug in the Sandbox 1.3 environment.

Viewing 5 replies - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #30619

    Sasha J
    Moderator

    Krishna,
    thank you very much for sharing your experience!

    Sasha

    Collapse
    #30055

    I was finally able to resolve the problem after spending countless hours trying out various things and googling related issues other have faced. Here is the solution for anyone who has faced similar problem:

    In HDP 1.3 Sandbox before venturing into Pig development that uses HBaseStorage and other serious HBase development, you have to do following to be able to read and write to HBase tables successfully:

    1) Edit /usr/lib/pig/conf/pig-env.sh
    Add: export HBASE_HOME=/usr/lib/hbase

    2) Edit /usr/lib/hadoop/hadoop-env.sh
    Add
    HBASE_JARS=
    for f in $HBASE_HOME/lib/*.jar; do
    HBASE_JARS=${HBASE_JARS}:$f;
    done
    export HADOOP_CLASSPATH=$HBASE_JARS:$HADOOP_CLASSPATH

    Reboot the VM so that new classpath gets effective for all the MR nodes
    $shutdown -r now

    3) Edit /usr/lib/hbase/conf/hbase-site.xml

    Edit the value of property “zookeeper.znode.parent”
    from “/hbase-unsecure” to “/hbase”
    Without this change you would not be able to retrieve MasterServer from Zookeper connection.
    I am using HareDB-HBase client to visalize my HBase tables and data and he would not connect to the hbase server without making this change, I think this is some kind of compatability issue between the various components.

    You have to be careful that you should first stop HBase from Ambari, then make the edit in hbase-site.xml and then manually start the hbase daemons (start-hbase.sh). Every time you use Ambari, the value in hbase-site will be over-ridden and client connection will not work.

    hope this helps somebody avoid the frustrations I had to go through :)

    Krishna

    Collapse
    #30003

    Ted,
    By the way .. If I change the script STORE command to write to HDFS using PigStorage instead of HBaseStorage everything works fine. So, strangely having the HBase STORE command makes Pig not being able to read the HDFS file.

    Krishna

    Collapse
    #30001

    Hi Ted,
    I uploaded the file to HDFS using Hue GUI. I see that file belongs to User: “hue” and Group: “hdfs” with following permission (-rwxr-xr-x).

    I am SSH’ed as “root” before launching Pig for the Grunt shell. Do you suggest launching the Pig script as “hue” to make sure it has access to the hdfs file. Is there a way to assign permissions to files in HDFS?

    Krishna

    Collapse
    #29999

    tedr
    Moderator

    HI Krishna,

    this looks like it could be due to permissions, what user is the script being run as? does that user have permission to read the file?

    Thanks,
    Ted.

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)