HBase Forum

Querying HBase Table

  • #44878
    Zin Zin

    I am trying to write a simple filter to be invoked from HBase Shell

    create ‘Acc’,’c1′
    put ‘Acc’,1,’c1:id’,100
    put ‘Acc’,2,’c1:id’,200
    put ‘Acc’,3,’c1:id’,300
    put ‘Acc’,4,’c1:id’,400
    put ‘Acc’,5,’c1:id’,500
    put ‘Acc’,6,’c1:id’,600
    put ‘Acc’,1,’c1:val’,100
    put ‘Acc’,2,’c1:val’,200
    put ‘Acc’,3,’c1:val’,300
    put ‘Acc’,4,’c1:val’,400
    put ‘Acc’,5,’c1:val’,500
    put ‘Acc’,6,’c1:val’,600

    Now I want to filter all rows where c1:val >= 300

    How would I do it ?

to create new topics or reply. | New User Registration

  • Author
  • #44881

    Hi Paulie,

    Try this:

    scan ‘Acc’, {COLUMNS => [‘c1′], FILTER => “SingleColumnValueFilter(‘c1′,’val’,>=,’binary:300′)” }

    Zin Zin

    Thanks that worked. How would you do sum group by etc ? Do you need to use mapreduce for that


    Sum and Group By require an additional layer on top of HBase. There are several options available, such as project Phoenix (https://github.com/forcedotcom/phoenix) or using the Hive-Hbase integration: http://hortonworks.com/blog/hbase-via-hive-part-1.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.