Home Forums HBase HBase table record version remains 1 – BulkLoad

This topic contains 3 replies, has 2 voices, and was last updated by  Devaraj Das 2 months, 1 week ago.

  • Creator
    Topic
  • #47897

    Anand M
    Participant

    Hello,

    Can someone please tell me whether HBase Bulkload using the APIs – importtsv and loadincrementalfiles (completebulkload) result in only the single version of the record(latest) being retained in the HBase table?

    Description of the table.

    ‘Weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ’0′, COMPRESSION => ‘NONE’, VERSIONS => ’3′, TTL => ’2147483647′, MIN_VERSIONS => ’0′, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ’65536′, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}

    Thanks
    -Anand

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #48397

    Devaraj Das
    Participant

    Here is what I did to reproduce the problem (but couldn’t).
    1. created table with VERSIONS = 3. Here is the ‘describe’ table output –

    hbase(main):004:0> describe ‘weather’
    DESCRIPTION ENABLED
    ‘weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ’0′, VERSIONS => ’3′, true
    COMPRESSION => ‘NONE’, MIN_VERSIONS => ’0′, TTL => ’2147483647′, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ’65536′, IN_MEMORY =
    > ‘false’, ENCODE_ON_DISK => ‘true’, BLOCKCACHE => ‘true’}

    2. Imported 2 files with 1 row each. The content of the first file is “1,dd2″ and the second “1,dd3″ (just to make sure the cell timestamps are different, have two imports for the same row).
    3. Loaded the hfiles.

    4. Then when i scan, i do see both the versions

    hbase(main):003:0> scan ‘weather’,{TIMERANGE=>[0,11111111111111],VERSIONS=>2}
    ROW COLUMN+CELL
    1 column=weatherinfo:wban, timestamp=1392004141445, value=dd2
    1 column=weatherinfo:wban, timestamp=1392004056051, value=dd3

    Collapse
    #48383

    Anand M
    Participant

    Yes, I am NOT seeing the retention of versioned rows. Just 1 version and that too the last version.

    I think am making a mistake. But where – I don’t know?

    Steps that I follow are below:
    1. importtsv call

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv ‘-Dimporttsv.separator=,’ -Dimporttsv.bulk.output=/tmp/weather-hfile -Dimporttsv.columns=HBASE_ROW_KEY,weatherinfo:wban,weatherinfo:date,weatherinfo:temp,weatherinfo:dewp,weatherinfo:slp,weatherinfo:stp,weatherinfo:visib,weatherinfo:wdsp,weatherinfo:mxspd,weatherinfo:gust,weatherinfo:max,weatherinfo:min,weatherinfo:prcp,weatherinfo:sndp,weatherinfo:frshtt Weather /tmp/weatherdata_1GB.csv

    2. The above step will create HFileOutputFormat file which will be loaded into HBase table using the completebulkload API.

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/weather-hfile Weather

    Collapse
    #48208

    Devaraj Das
    Participant

    From what I can tell, it should retain all the versions (3 in your case). Are you seeing otherwise?

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)