The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HBase Forum

only one cell scan. how all scan possible?

  • #50753
    Joseph Hwang
    Participant

    I use HBase 0.96 for hadoop 2.2. And I try to code mapreduce with TableMapReduceUtil.initTableMapperJob method.
    The number of cells of my hbase column family is allmost 200. Below are my data format and map function codes

    Data =========
    Brazil column=INTLCTRY_DATA:age, timestamp=1396002150554, value=Aged 15-24
    Brazil column=INTLCTRY_DATA:average, timestamp=1396002150554, value=3831000.0
    Brazil column=INTLCTRY_DATA:data, timestamp=1377961200000, value=3831000.0 <=(This cell contains 200 timestamps)
    Brazil column=INTLCTRY_DATA:freq, timestamp=1396002150554, value=\x00
    Brazil column=INTLCTRY_DATA:sex, timestamp=1396002150554, value=All Persons
    Brazil column=INTLCTRY_DATA:title, timestamp=1396002150554, value=Active Population

    Driver class=====
    public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub
    Configuration config = HBaseConfiguration.create();
    Job job = Job.getInstance(config,”HBase MapReduce Test”);
    job.setJarByClass(MyDriver.class);

    Scan scan = new Scan();
    scan.setMaxResultSize(200);
    scan.setCaching(1000);
    scan.setCacheBlocks(false);
    scan.setMaxVersions();

    TableMapReduceUtil.initTableMapperJob( “INTLCTRY_TABLE”, scan, MyMapper.class, Text.class, FloatWritable.class, job );
    ….

    Map class======
    public class MyMapper extends TableMapper<Text, FloatWritable> {

    private final byte[] COLUMN_FAMILY = “INTLCTRY_DATA”.getBytes();
    private Text key = new Text();
    private FloatWritable output = new FloatWritable();

    @Override
    public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
    String bCntyName = new String(value.getRow());
    String bTitle = new String(value.getValue(COLUMN_FAMILY, Bytes.toBytes(“title”)));
    String bAgeRange = new String(value.getValue(COLUMN_FAMILY, Bytes.toBytes(“age”)));
    String bSex = new String(value.getValue(COLUMN_FAMILY, Bytes.toBytes(“sex”)));
    char bFreq = new String(value.getValue(COLUMN_FAMILY, Bytes.toBytes(“freq”))).charAt(0);

    key.set(bCntyName+”,”+bTitle+”,”+bAgeRange+”,”+bSex+”,”+bUnit+”,”+bFreq+”,”+bSeasonalAdj+”,”+bUpdateDate);
    System.out.println(“LENGTH : ” + value.listCells().size()); // Length is NOT 200, only 9

    for (Cell c : value.rawCells()) {
    String qualifier = new String(CellUtil.cloneQualifier(c));
    if (qualifier.equals(“data”)) {
    Float f = Float.parseFloat(new String(CellUtil.cloneValue(c))); // parsing only 1 value, not all values
    output.set(f);
    context.write(key, output);
    }
    }

    It seems Result value contains only newest version cell, not all cells. How can i scan all cells in hbase map function?
    Pls, give me your advice! Thanks in advance

  • Author
    Replies
  • #51315
    Nick Dimiduk
    Moderator

    You need to configure your scanner to retrieve historical versions with something like Scan#setMaxVersions(). This is disabled by default. The Versions section of our online includes more information.

    Cheers,
    Nick

The forum ‘HBase’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.