cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
Apache Projects
Apache Phoenix

Apache Phoenix

MENU

OVERVIEW

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. It enables developers to access large dataset in real-time with familiar SQL interface.

  • Standard SQL and JDBC APIs with full ACID transaction capabilities
  • Support for late-bound, schema-on-read with existing data in HBase
  • Access data stored and produced in other Hadoop products such as Spark, Hive, Pig, Flume, and MapReduce
*Apache Phoenix supports a subset of SQL commands of SQL 92 standard

What Phoenix does

Apache HBase provides random, real time access to data in Hadoop. It’s well adopted in the Hadoop ecosystem. Apache Phoenix abstract away the underlying data store by enable you to query the data with standard SQL via JDBC driver. Apache Phoenix provides features such as secondary indexes to help you speed up the queries without relying on specific row key designs.

Apache Phoenix is also massively parallel where aggregation queries are executed on the nodes where data is stored, greatly reduce the need to send data over the network.

Feature Description
Familiar Query data with a SQL-based language
Fast Real-time queries
Reliable Built on top of proven data store HBase
Platform agnostic Hortonworks’ Phoenix provides ODBC connector drivers, allowing you to connect to your dataset using familiar BI tools.

How Phoenix works

Phoenix provides fast access to large amount of data. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized cluster). This time come down to few milliseconds if query contains filter on key columns. For filters on non-key columns or non-leading key columns, you can add secondary indexes on these columns which leads to performance equivalent to filtering on key column by making copy of table with indexed column(s) part of key.

Why is Phoenix fast even when doing full scan:

  1. Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a configurable number of threads
  2. The aggregation will be done in a coprocessor on the server-side, collapsing the amount of data that gets returned back to the client rather than returning it all.

Recent Releases

This table summarizes recent innovation in Apache Phoenix.

Version Progress
Updates in in HDP 2.6
  • Incremental index rebuild
  • Data integrity check tool (Tech preview)
  • Python-phoenix connector (Community supported)
Updates in in HDP 2.5
  • Developer and BI tool interface
    • .Net driver
    • ODBC driver
    • Support querying Phoenix data in Hive
  • Engine enhancements
    • Support of OFFSET in SQL queries
  • Multitenancy
    • Support of Namespaces
    • Support of HBase RegionServer Group

Forums

Phoenix Tutorials

Phoenix in our Blog