Apache HBase is the online database natively integrated with Hadoop, making HBase the obvious choice for applications that rely on Hadoop’s scale and flexible data processing. With the Hortonworks Data Platform 2.2, HBase High Availability has taken a major step forward, allowing apps on HBase to deliver 99.99% uptime guarantees. This blog takes a look at how HBase High Availability has improved over the past 12 months and how it will improve even more in the future.
High Availability is a key attribute for any database and a prerequisite for any business critical application. Traditionally, HBase used 2 strategies for keeping data available: First, HBase automatically partitions data and distributes the partitions across nodes. Losing a node only affects data held within that node, other data remains unaffected. Second, HBase stores all its data in HDFS, meaning 3 copies of the data are stored in different machines and available to every node in the cluster. This lets HBase automatically redistribute data from failed nodes onto nodes that are still running.
Combining these innate HA properties with Hadoop best practices allowed HBase apps to reasonably deliver 99.9% availability, which translates to an aggregate downtime of a bit under 9 hours per year. This is suitable for many applications, but mission-critical applications need better guarantees.
We are in the early stages of a massive replatforming of data applications to Hadoop. Whether applications need greater scale or more flexible data processing, Hadoop has become the de-facto choice because of its unstoppable mindshare and momentum. Since HBase is the Hadoop database, it is the obvious choice for any online application that wants to benefit from Hadoop’s ubiquity and rapid innovation.
As we spoke to customers looking to migrate business-critical workloads to HBase, we consistently heard that they needed the data consistency HBase provides, but couldn’t tolerate even small windows of downtime. In order for Hadoop to host mission-critical online applications, HBase’s High Availability capabilities needed to take a major step forward.
Hortonworks worked within the HBase community to deliver this major step forward in HA by introducing Timeline-Consistent Region Replicas, also called “HBase Read HA” and tracked by HBASE-10070. At a high level this new HA feature maintains multiple copies of the same data in Primary Region Replicas and Secondary Region Replicas that are spread throughout the HBase cluster. With HBase Read HA, if a RegionServer fails, its data can still be read from a separate RegionServer. In other words you retain read availability while only losing write availability during automatic recovery. This makes HBase Read HA a great option for read-heavy workloads that need consistency and no downtime. Combined with best practices such as using 2 replicas and rack awareness, HBase Read HA allows HBase to deliver 99.99% availability for these mission-critical applications.
HBase ensures data consistency by routing all write operations for a key through the RegionServer solely responsible for that particular key. On the plus side, this makes data consistency extremely simple, having one owner means no split brains, no last-write-wins scenarios, and makes important things like counters fast and trivial to implement.
On the minus side, if a RegionServer is lost, the entire key range owned by that RegionServer is offline until the data is recovered. In HBase 0.96 this recovery process was optimized to the point where it takes less than a minute, but still we’ve traded some availability away to ensure we have high consistency.
The CAP Theorem taught us that compromises must be made between consistency and availability, and that there is no perfect system that delivers both consistency and availability all the time. Many modern databases choose to optimize for availability by implementing a pure AP model, i.e. to give up consistency and optimize for availability. Giving up consistency leads to a database that forces its users to understand complex topics in distributed systems. In many ways, using an eventually consistent database turns you into a database developer rather than a database user.
The truth is that network partitions don’t happen all the time, so it’s not necessary to sacrifice consistency all of the time to protect against failures that happen some of the time. A good discussion of this, and of Timeline Consistency overall can be read over at Daniel Abadi’s blog.
HBase Read HA implements a Timeline Consistency system which offers developers the ability to choose between strict consistency and relaxed consistency at query time.
With HBase Read HA:
And from the client’s perspective:
This model has several advantages:
Timeline Consistency combines strong consistency with graceful degradation during failures, meaning you get higher availability without burdening developers with the complexities of reasoning about eventually consistent systems.
Development of HBase Read HA happened in two phases. In many ways, Phase 1 was used to prove the model and API semantics while Phase 2 delivers a version suitable for production. If you looked at HBase Read HA in HDP 2.1 and decided it was wasn’t ready due to feature gaps like split/merge support, HDP 2.2 lets you use HA with all the HBase features you expect.
HBase Read HA is delivered as part of Hortonworks Data Platform 2.2. If you’re interested in improving your app’s availability we encourage you to try it out. Get started by reading about HBase Read HA in the HBase documentation. You can also learn more about HA by watching the HBaseCon session.
HBase High Availability has improved dramatically over the past 12 months, but more work remains. Today, HBase doesn’t address two important areas:
We’re very excited to see that these issues are beginning to be addressed in the community by way of HBASE-12259, which tracks the effort to bring Facebook’s HydraBase into Apache HBase. In the future we’ll see HBase offer up to 5 9s availability while retaining the strong data consistency levels that business critical systems require.