Apache™ HBase is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS). It is columnar and provides fault-tolerant storage and quick access to large quantities of sparse data. It also adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes.
What HBase Does
HBase provides random, real time access to your Big Data. HBase was created for hosting very large tables with billions of rows and millions of columns.
HBase provides the following benefits:
- Fault tolerant storage for large quantities of data
- Flexible data model
- Easy Java API as well as Thrift, or REST gateway APIs
- Near real-time lookups
- Atomic and strongly consistent row-level operations
- Automatic sharding and load balancing of tables
- Metrics exports via File and Ganglia plugins
- High availability through automatic failover
- In-memory caching via block cache and bloom filters
- Server side processing via filters and co-processors
- Replication across the data center
How HBase Works
Apache HBase uses Log Structured Merge trees (LSM trees) to store and query the data. It features, compression, in-memory caching, bloom filters, and very fast scans. HBase tables can serve as both the input and output for MapReduce jobs.