HBase table record version remains 1 – BulkLoad

This topic contains 3 replies, has 2 voices, and was last updated by  Devaraj Das 1 year ago.

  • Creator
    Topic
  • #47897

    Anand M
    Participant

    Hello,

    Can someone please tell me whether HBase Bulkload using the APIs – importtsv and loadincrementalfiles (completebulkload) result in only the single version of the record(latest) being retained in the HBase table?

    Description of the table.

    ‘Weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘3’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}

    Thanks
    -Anand

Viewing 3 replies - 1 through 3 (of 3 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #48397

    Devaraj Das
    Participant

    Here is what I did to reproduce the problem (but couldn’t).
    1. created table with VERSIONS = 3. Here is the ‘describe’ table output –

    hbase(main):004:0> describe ‘weather’
    DESCRIPTION ENABLED
    ‘weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, true
    COMPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY =
    > ‘false’, ENCODE_ON_DISK => ‘true’, BLOCKCACHE => ‘true’}

    2. Imported 2 files with 1 row each. The content of the first file is “1,dd2″ and the second “1,dd3″ (just to make sure the cell timestamps are different, have two imports for the same row).
    3. Loaded the hfiles.

    4. Then when i scan, i do see both the versions

    hbase(main):003:0> scan ‘weather’,{TIMERANGE=>[0,11111111111111],VERSIONS=>2}
    ROW COLUMN+CELL
    1 column=weatherinfo:wban, timestamp=1392004141445, value=dd2
    1 column=weatherinfo:wban, timestamp=1392004056051, value=dd3

    Collapse
    #48383

    Anand M
    Participant

    Yes, I am NOT seeing the retention of versioned rows. Just 1 version and that too the last version.

    I think am making a mistake. But where – I don’t know?

    Steps that I follow are below:
    1. importtsv call

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv ‘-Dimporttsv.separator=,’ -Dimporttsv.bulk.output=/tmp/weather-hfile -Dimporttsv.columns=HBASE_ROW_KEY,weatherinfo:wban,weatherinfo:date,weatherinfo:temp,weatherinfo:dewp,weatherinfo:slp,weatherinfo:stp,weatherinfo:visib,weatherinfo:wdsp,weatherinfo:mxspd,weatherinfo:gust,weatherinfo:max,weatherinfo:min,weatherinfo:prcp,weatherinfo:sndp,weatherinfo:frshtt Weather /tmp/weatherdata_1GB.csv

    2. The above step will create HFileOutputFormat file which will be loaded into HBase table using the completebulkload API.

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/weather-hfile Weather

    Collapse
    #48208

    Devaraj Das
    Participant

    From what I can tell, it should retain all the versions (3 in your case). Are you seeing otherwise?

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.