Meet the Committer: 3 Minutes on Apache HBase with Enis Soztutar
We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.
This week Enis Soztutar discusses Apache HBase, built for random read/write access to data in billions of rows and millions of columns.
Enis began using Apache Hadoop in 2006. Now, Enis is a Hortonworks engineer and Apache HBase project management chair. He has also been a committer to Apache Hadoop since 2007 and to HBase since 2012.
In this brief video, Enis describes what HBase is, why it was created, and how it works.
What is HBase?
- A no-SQL, non-relational database that runs on top of the Hadoop Distributed File System (HDFS)
- Designed to be scalable, on commodity hardware
- Designed to be distributed: file storage can be spread out among an array of independent machines
- Intended to run on top of Hadoop
Why was HBase created?
- Apache HBase is the open source implementation of Google’s BigTable (as described in their Bigtable paper)
- Built for random read/write access to enormous data sets, with billions of rows and millions of columns
How Does HBase work?
- HBase indexes data with a row key, a column key and a time stamp
- Key/value pairs are sorted alphabetically by their key, as in this fictional example with only three data elements:
- “aaa” : “This is the value in the first row”
- “abc” : “Second row”
- “zzz” : “A quick brown fox jumps over the lazy dog”
- Used by many enterprises such as Yahoo!, Facebook and Twitter
Watch the Hortonworks blog for an upcoming technical HBase post about the upcoming release of HBase version 0.96.
Also, take a look at past Hortonworks blogs discussing HBase