HBase at Hortonworks: An Update
HBase is a critical component of the Apache Hadoop ecosystem and a core component of the Hortonworks Data Platform. HBase enables a host of low latency Hadoop use-cases; As a publishing platform, HBase exposes data refined in Hadoop to outside systems; As an online column store, HBase supports the blending of random access data read/write with application workloads whose data is directly accessible to Hadoop MapReduce.
The HBase community is moving forward aggressively, improving HBase in many ways. We are in the process of integrating HBase 0.94 into our upcoming HDP 1.1 refresh. This “minor upgrade” will include a lot of bug fixes (nearly 200 in number) and quite a few performance improvements and will be wire compatible with HBase 0.92 (in HDP 1.0). Here are some notable ones:
- HBASE-4128 – Data Block Encoding of KeyValues (aka delta encoding / prefix compression) [PERFORMANCE]
- HBASE-4465 – Lazy-seek optimization for StoreFile scanners [PERFORMANCE]
- HBASE-5074 – support checksums in HBase block cache [PERFORMANCE]
- HBASE-5128 – [uber hbck] Online automated repair of table integrity and region consistency problems [OPERABILITY]
- HBASE-3584 – Allow atomic put/delete in one call [FEATURE]
- HBASE-5229 – Provide basic building blocks for “multi-row” local transactions [FEATURE]
And 0.94 is only the start. Expect to see an a huge set of additional features, bug fixes, performance and operational improvements to HBase in the coming months. As more of our customers have deployed HBase it has become an increasingly important component of HDP 1. As a result, we’ve really been ramping up our investment in HBase this year, with a focus on enhancing HBase stability and operability. What follows is a summary of Hortonworkers recent HBase contributions.
1. Reliability improvements
We have established an automated test harness for testing HBase on a nightly basis. The harness involves automated deployment of HBase with a ‘production like’ configuration. After the cluster has been set up, a few heavy duty jobs are run. This has uncovered numerous bugs in the 0.92.x line.
Some of them are:
- HBASE-5986: Clients can see holes in the META table when regions are being split
- HBASE-6160: META entries from daughters can be deleted before parent entries
- HBASE-6679: RegionServer aborts due to race between compaction and split
- HBASE-6060: Regions’s in OPENING state from failed regionservers takes a long time to recover
- HBASE-6649: TestReplication.queueFailover occasionally fails [Part-1]
- HBASE-6758: The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
2. Test Infrastructure Improvements
One of the biggest needs in the community is a good testing framework for HBase. As HBase is becoming more popular as a NoSQL data store, we need to make sure that the system is highly available and reliable in the face of common node failures, and that it is able to withstand the intense, high stress workloads users expect in production environments.
Towards this end we have been building an automated test framework inspired by Netflix’s ChaosMonkey tool. It can run a series of tests, while killing and restarting HBase servers and validate that the test results are correct. This brings to the fore the availability and reliability aspects of the system. For example, if a RegionServer is killed, another RegionServer or a set of RegionServers should pick the data that the killed RegionServer was serving.
Using the APIs provided by this testing framework, one can convert many of the tests in the HBase codebase to run in either unit test mode or in this new challenging “real cluster mode”. The test framework is part of the HBase codebase (via HBASE-6241), and many candidate tests have been identified that can be ported to use the new framework.
The Microsoft Windows port and certification of HBase is an ongoing joint development effort invovling Hortonworks and Microsoft engineers. We recently reached an important milestone, getting all of the hbase-0.94 unit tests passing on Windows. Work is underway to commit all the patches to HBase mainline under the umbrella jira HBASE-6814. We are well on the way to our goal of having HBase run equally well on Windows and Unix, opening up the Apache HBase community to a whole new universe of potential users and contributors.
4. HBase with NameNode HA setup and validation
We’ve been working to validate that HBase runs well with the new Apache Hadoop 1.0 HA features. The HBase HA testing blog is here .
5. The wire-compatibility work targeted for 0.96.x release.
We have done substantial work to move all protocols in HBase including the RPC implementation to use Google’s Protocol Buffers. Most of the work is captured in this umbrella jira – HBASE-5305.
All of the above is just what we’ve been doing recently and Hortonworkers are only a small fraction of the HBase contributor base. When one factors in all the great contributions coming from across the Apache HBase community, we predict 2013 is going to be a great year for HBase. HBase is maturing fast, becoming both more operationally reliable and more feature rich.