Home Forums HBase HBase table record version remains 1 – BulkLoad

This topic contains 3 replies, has 2 voices, and was last updated by  Devaraj Das 10 months, 1 week ago.

  • Creator
  • #47897

    Anand M


    Can someone please tell me whether HBase Bulkload using the APIs – importtsv and loadincrementalfiles (completebulkload) result in only the single version of the record(latest) being retained in the HBase table?

    Description of the table.

    ‘Weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘3’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}


Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
  • #48397

    Devaraj Das

    Here is what I did to reproduce the problem (but couldn’t).
    1. created table with VERSIONS = 3. Here is the ‘describe’ table output –

    hbase(main):004:0> describe ‘weather’
    ‘weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, true
    COMPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY =
    > ‘false’, ENCODE_ON_DISK => ‘true’, BLOCKCACHE => ‘true’}

    2. Imported 2 files with 1 row each. The content of the first file is “1,dd2″ and the second “1,dd3″ (just to make sure the cell timestamps are different, have two imports for the same row).
    3. Loaded the hfiles.

    4. Then when i scan, i do see both the versions

    hbase(main):003:0> scan ‘weather’,{TIMERANGE=>[0,11111111111111],VERSIONS=>2}
    1 column=weatherinfo:wban, timestamp=1392004141445, value=dd2
    1 column=weatherinfo:wban, timestamp=1392004056051, value=dd3


    Anand M

    Yes, I am NOT seeing the retention of versioned rows. Just 1 version and that too the last version.

    I think am making a mistake. But where – I don’t know?

    Steps that I follow are below:
    1. importtsv call

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv ‘-Dimporttsv.separator=,’ -Dimporttsv.bulk.output=/tmp/weather-hfile -Dimporttsv.columns=HBASE_ROW_KEY,weatherinfo:wban,weatherinfo:date,weatherinfo:temp,weatherinfo:dewp,weatherinfo:slp,weatherinfo:stp,weatherinfo:visib,weatherinfo:wdsp,weatherinfo:mxspd,weatherinfo:gust,weatherinfo:max,weatherinfo:min,weatherinfo:prcp,weatherinfo:sndp,weatherinfo:frshtt Weather /tmp/weatherdata_1GB.csv

    2. The above step will create HFileOutputFormat file which will be loaded into HBase table using the completebulkload API.

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/weather-hfile Weather


    Devaraj Das

    From what I can tell, it should retain all the versions (3 in your case). Are you seeing otherwise?

Viewing 3 replies - 1 through 3 (of 3 total)