The Hortonworks Blog

More from Enis Soztutar

The third HBaseCon is happening in May 5th this year in San Francisco which is THE community event for Apache HBase. As with the previous years, this year the agenda is quite exciting.

There will be 4 tracks, Operations, Features and Internals, Ecosystem and Case Studies. The keynotes will include speakers from Cloudera who is the event host, Google BigTable team as a follow up to their ‘06 BigTable paper, Salesforce on their experience with HBase operations and use cases and Facebook on their strongly consistent multi data center replication scheme.…

With over 230 JIRA tickets resolved, the Apache HBase community released 0.98.0 yesterday which is the next major version after 0.96.x series.

HBase 0.98.0 comes with an exciting set of new features with keeping the same stability improvements and features on top of 0.96. Additional to usual bug fixes, some of the major improvements include:

  • Reverse Scans (HBASE-4811): for use cases where both forward and reverse iteration is required, HBase now allows to perform scans in reverse mode.

Last week, the HBase community released 0.94.5, which is the most stable release of HBase so far. The release includes 76 jira issues resolved, with 61 bug fixes, 8 improvements, and 2 new features.

Most of the bug fixes went against the REST server, replication, region assignment, secure client, flaky unit tests, 0.92 compatibility and various stability improvements. Some of the interesting patches in this release are:
[HBASE-3996] – Support multiple tables and scanners as input to the mapper in map/reduce jobs
[HBASE-5416] – Improve performance of scans with some kind of filters.…

For this post, we take a technical deep-dive into one of the core areas of HBase. Specifically, we will look at how Apache HBase distributes load through regions, and manages region splitting. HBase stores rows of data in tables. Tables are split into chunks of rows called “regions”. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process. A region is a continuous range within the key space, meaning all rows in the table that sort between the region’s start key and end key are stored in the same region.…