HBase table record version remains 1 – BulkLoad

to create new topics or reply. | New User Registration

This topic contains 3 replies, has 2 voices, and was last updated by  Devaraj Das 1 year, 1 month ago.

  • Creator
  • #47897

    Anand M


    Can someone please tell me whether HBase Bulkload using the APIs – importtsv and loadincrementalfiles (completebulkload) result in only the single version of the record(latest) being retained in the HBase table?

    Description of the table.

    ‘Weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘3’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}


Viewing 3 replies - 1 through 3 (of 3 total)

You must be to reply to this topic. | Create Account

  • Author
  • #48397

    Devaraj Das

    Here is what I did to reproduce the problem (but couldn’t).
    1. created table with VERSIONS = 3. Here is the ‘describe’ table output –

    hbase(main):004:0> describe ‘weather’
    ‘weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, true
    COMPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY =
    > ‘false’, ENCODE_ON_DISK => ‘true’, BLOCKCACHE => ‘true’}

    2. Imported 2 files with 1 row each. The content of the first file is “1,dd2″ and the second “1,dd3″ (just to make sure the cell timestamps are different, have two imports for the same row).
    3. Loaded the hfiles.

    4. Then when i scan, i do see both the versions

    hbase(main):003:0> scan ‘weather’,{TIMERANGE=>[0,11111111111111],VERSIONS=>2}
    1 column=weatherinfo:wban, timestamp=1392004141445, value=dd2
    1 column=weatherinfo:wban, timestamp=1392004056051, value=dd3


    Anand M

    Yes, I am NOT seeing the retention of versioned rows. Just 1 version and that too the last version.

    I think am making a mistake. But where – I don’t know?

    Steps that I follow are below:
    1. importtsv call

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv ‘-Dimporttsv.separator=,’ -Dimporttsv.bulk.output=/tmp/weather-hfile -Dimporttsv.columns=HBASE_ROW_KEY,weatherinfo:wban,weatherinfo:date,weatherinfo:temp,weatherinfo:dewp,weatherinfo:slp,weatherinfo:stp,weatherinfo:visib,weatherinfo:wdsp,weatherinfo:mxspd,weatherinfo:gust,weatherinfo:max,weatherinfo:min,weatherinfo:prcp,weatherinfo:sndp,weatherinfo:frshtt Weather /tmp/weatherdata_1GB.csv

    2. The above step will create HFileOutputFormat file which will be loaded into HBase table using the completebulkload API.

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/weather-hfile Weather


    Devaraj Das

    From what I can tell, it should retain all the versions (3 in your case). Are you seeing otherwise?

Viewing 3 replies - 1 through 3 (of 3 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.