HBase Forum

HBase table record version remains 1 – BulkLoad

  • #47897
    Anand M


    Can someone please tell me whether HBase Bulkload using the APIs – importtsv and loadincrementalfiles (completebulkload) result in only the single version of the record(latest) being retained in the HBase table?

    Description of the table.

    ‘Weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘3’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}


to create new topics or reply. | New User Registration

  • Author
  • #48208
    Devaraj Das

    From what I can tell, it should retain all the versions (3 in your case). Are you seeing otherwise?

    Anand M

    Yes, I am NOT seeing the retention of versioned rows. Just 1 version and that too the last version.

    I think am making a mistake. But where – I don’t know?

    Steps that I follow are below:
    1. importtsv call

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv ‘-Dimporttsv.separator=,’ -Dimporttsv.bulk.output=/tmp/weather-hfile -Dimporttsv.columns=HBASE_ROW_KEY,weatherinfo:wban,weatherinfo:date,weatherinfo:temp,weatherinfo:dewp,weatherinfo:slp,weatherinfo:stp,weatherinfo:visib,weatherinfo:wdsp,weatherinfo:mxspd,weatherinfo:gust,weatherinfo:max,weatherinfo:min,weatherinfo:prcp,weatherinfo:sndp,weatherinfo:frshtt Weather /tmp/weatherdata_1GB.csv

    2. The above step will create HFileOutputFormat file which will be loaded into HBase table using the completebulkload API.

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/weather-hfile Weather

    Devaraj Das

    Here is what I did to reproduce the problem (but couldn’t).
    1. created table with VERSIONS = 3. Here is the ‘describe’ table output –

    hbase(main):004:0> describe ‘weather’
    ‘weather’, {NAME => ‘weatherinfo’, DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘ROW’, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, true
    COMPRESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE => ‘65536’, IN_MEMORY =
    > ‘false’, ENCODE_ON_DISK => ‘true’, BLOCKCACHE => ‘true’}

    2. Imported 2 files with 1 row each. The content of the first file is “1,dd2” and the second “1,dd3” (just to make sure the cell timestamps are different, have two imports for the same row).
    3. Loaded the hfiles.

    4. Then when i scan, i do see both the versions

    hbase(main):003:0> scan ‘weather’,{TIMERANGE=>[0,11111111111111],VERSIONS=>2}
    1 column=weatherinfo:wban, timestamp=1392004141445, value=dd2
    1 column=weatherinfo:wban, timestamp=1392004056051, value=dd3

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.