Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. It enables developers to access large dataset in real-time with familiar SQL interface.
Apache HBase provides random, real time access to data in Hadoop. It’s well adopted in the Hadoop ecosystem. Apache Phoenix abstract away the underlying data store by enable you to query the data with standard SQL via JDBC driver. Apache Phoenix provides features such as secondary indexes to help you speed up the queries without relying on specific row key designs.
Apache Phoenix is also massively parallel where aggregation queries are executed on the nodes where data is stored, greatly reduce the need to send data over the network.
|Familiar||Query data with a SQL-based language|
|Reliable||Built on top of proven data store HBase|
|Platform agnostic||Hortonworks’ Phoenix provides ODBC connector drivers, allowing you to connect to your dataset using familiar BI tools.|
Phoenix provides fast access to large amount of data. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized cluster). This time come down to few milliseconds if query contains filter on key columns. For filters on non-key columns or non-leading key columns, you can add secondary indexes on these columns which leads to performance equivalent to filtering on key column by making copy of table with indexed column(s) part of key.
Why is Phoenix fast even when doing full scan:
Phoenix provides the following enhancements in HDP 2.5 release