Home Forums Pig Loading into HBase table using Pig fails

This topic contains 7 replies, has 7 voices, and was last updated by  Scott Saufferer 3 months, 2 weeks ago.

  • Creator
    Topic
  • #44509

    Anand M
    Participant

    Hello,

    Need to know the steps to integrate Pig with Hbase.
    I have a 8 node cluster running with HDP2.0 and the installation was done through Ambari.

    I am unable to load data into Hbase table using Pig. Any help would be appreciated.

    Regards
    -Anand

Viewing 7 replies - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #58641

    Scott Saufferer
    Participant

    Also, using the 2.1 sandbox environment to test loading into hbase via pig, per the title of this thread fails. I followed the instructions to “Using Pig to Bulk Load Data Into HBase” elsewhere in this site and it fails with log message shown below.

    A = LOAD ‘hdfs:///tmp/data.tsv’ USING PigStorage(‘\t’) AS (id:chararray, c1:chararray, c2:chararray);
    –DUMP A;
    STORE A INTO ‘simple_hcat_load_table’ USING org.apache.hcatalog.pig.HCatStorer();

    2014-08-12 16:38:41,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Failed!
    2014-08-12 16:38:41,737 [main] ERROR org.apache.pig.tools.grunt.GruntParser – ERROR 2998: Unhandled internal error. org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$MutationType
    Details at logfile: /hadoop/yarn/local/usercache/hue/appcache/application_1407882621600_0006/container_1407882621600_0006_01_000002/pig_1407886546063.log

    Collapse
    #58639

    Scott Saufferer
    Participant

    I’m working with the current HDP 2.1 sandbox and cannot get reads from or writes into Hbase either. I used the pig script below to read from the ambarismoketest table and get what looks to be a class not found exception when reading.

    raw = LOAD ‘hbase://ambarismoketest’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(‘family:col01′, ‘-loadKey true -limit 5′) AS (first_name:chararray);
    dump raw;

    2014-08-12 16:18:08,159 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1066: Unable to open iterator for alias raw. Backend error : java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.mapreduce.TableSplit not found

    This is out of the box. :( I’ll try a write example next.

    Collapse
    #54133

    Bob Russell
    Participant

    Rahman,

    Would you be able to post your environment? I have an ambari installed cluster and my insert into hbase is failing with TableInputFormat class not found exception.

    Collapse
    #49775

    Stanley Nguyen
    Participant

    Hi Tom,

    Any luck that you have resolved the issue? I ran into similar issue but it fails @ the dump statement. Running from console works fine, and so does from the sandbox so I’m not sure there’s some other additional configuration required for multiple hosts setup.

    Collapse
    #46589

    Tom Debus
    Participant

    Hi Guys,

    seem to have the same issue with the standard OotB VM (2.0) – all the Hive tutorials work fine, but the PIG loading of the sample NYSE stock file fail. Also other loading of existing or newly added tables seem to fail. Happy to attach the log. Or do I need to follow the same procedures to update hbase or pig?

    Thx
    -Tom

    Collapse
    #46576

    Hi Anand ,
    I dont know in 8node cluster but I have done in my single node cluster using these setting,which can help you to sort out your problem
    1) COPY THESE FILES TO THE HADOOP LIBRARY.

    sudo cp /usr/lib/hive/lib/hive-common-0.7.0-cdh3u0.jar /usr/lib/hadoop/lib/
    sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
    sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/

    2)CLOSE HBASE AND HADOOP USING FOLLOWING COMMOND
    /usr/lib/hadoop/bin/stop-all.sh
    /usr/lib/hbase/bin/stop-hbase.sh

    3) RESTART HBASE AND HADOOP USING COMMOND
    /usr/lib/hadoop/bin/start-all.sh
    /usr/lib/hadoop/bin/start-hbase.sh

    In order to create a new HBase table which is to be managed by Hive, use the STORED BY clause on CREATE TABLE:
    On Hive shell grunt>
    CREATE TABLE hbase_table_1(key int, value string)
    STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
    WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,cf1:val”)
    TBLPROPERTIES (“hbase.table.name” = “xyz”);

    After executing the command above, you should be able to see the new (empty) table in the HBase shell:
    $ hbase shell
    HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
    Version: 0.20.3, r902334, Mon Jan 25 13:13:08 PST 2010
    hbase(main):001:0> list
    xyz
    1 row(s) in 0.0530 seconds
    hbase(main):002:0> describe “xyz”
    DESCRIPTION ENABLED
    {NAME => ‘xyz’, FAMILIES => [{NAME => 'cf1', COMPRESSION => 'NONE', VE true
    RSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
    'false', BLOCKCACHE => 'true'}]}
    1 row(s) in 0.0220 seconds
    hbase(main):003:0> scan “xyz”
    ROW COLUMN+CELL
    0 row(s) in 0.0060 seconds
    Notice that even though a column name “val” is specified in the mapping, only the column family name “cf1″ appears in the DESCRIBE output in the HBase shell. This is because in HBase, only column families (not columns) are known in the table-level metadata; column names within a column family are only present at the per-row level.
    Here’s how to move data from Hive into the HBase table (see GettingStarted for how to create the example table pokes in Hive first):

    INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;
    Use HBase shell to verify that the data actually got loaded:
    hbase(main):009:0> scan “xyz”
    ROW COLUMN+CELL
    98 column=cf1:val, timestamp=1267737987733, value=val_98
    1 row(s) in 0.0110 seconds
    And then query it back via Hive:

    Collapse
    #45322

    abdelrahman
    Moderator

    Hi Anand,

    Is there any errors in the log. I have done a functional test for Pig and HBase integration and it works for me.

    Here is my pig script:

    pig script:

    raw = LOAD ‘hbase://ambarismoketest’
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
    ‘family:col01′, ‘-loadKey true -limit 5′)
    AS (first_name:chararray);

    dump raw ;
    2013-11-06 15:09:14,906 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:
    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.2.0.2.0.6.0-76 0.12.0.2.0.6.0-76 hdfs 2013-11-06 15:06:22 2013-11-06 15:09:14 UNKNOWN

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1383768104534_0004 1 0 16 16 16 16 n/a n/a n/a n/a raw MAP_ONLY hdfs://HDP.koelli.localdomain:8020/tmp/temp-807864500/tmp-1599452644,

    Input(s):
    Successfully read 1 records (342 bytes) from: “hbase://ambarismoketest”

    Output(s):
    Successfully stored 1 records (12 bytes) in: “hdfs://HDP.koelli.localdomain:8020/tmp/temp-807864500/tmp-1599452644″

    Counters:
    Total records written : 1
    Total bytes written : 12
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1383768104534_0004

    2013-11-06 15:09:15,137 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Success!
    2013-11-06 15:09:15,141 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2013-11-06 15:09:15,142 [main] INFO org.apache.pig.data.SchemaTupleBackend – Key [pig.schematuple] was not set… will not generate code.
    2013-11-06 15:09:15,178 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat – Total input paths to process : 1
    2013-11-06 15:09:15,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
    (row01)

    Thanks
    -Rahman

    Collapse
Viewing 7 replies - 1 through 7 (of 7 total)