Apache HBase

A non-relational (NoSQL) database that runs on top of HDFS

Apache™ HBase is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS).  It is columnar and provides fault-tolerant storage and quick access to large quantities of sparse data. It also adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes.

What HBase Does

HBase provides random, real time access to your Big Data. HBase was created for hosting very large tables with billions of rows and millions of columns.

HBase provides the following benefits:

  • Fault tolerant storage for large quantities of data
  • Flexible data model
  • Easy Java API as well as Thrift, or REST gateway APIs
  • Near real-time lookups
  • Atomic and strongly consistent row-level operations
  • Automatic sharding and load balancing of tables
  • Metrics exports via File and Ganglia plugins
  • High availability through automatic failover
  • In-memory caching via block cache and bloom filters
  • Server side processing via filters and co-processors
  • Replication across the data center

How HBase Works

Apache HBase uses Log Structured Merge trees (LSM trees) to store and query the data. It features, compression, in-memory caching, bloom filters, and very fast scans. HBase tables can serve as both the input and output for MapReduce jobs.

Apache Top-Level Project Since
May 2010
Hortonworks Committers
6

Try HBase with Sandbox

Hortonworks Sandbox is a self-contained virtual machine with HDP running alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox
More posts on:
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :