Bridging the NoSQL gap: Apache Phoenix
Apache Phoenix is a top-level Apache project that provides an SQL interface to HBase, mapping HBase models to a relational database world. Of course, HBase provides its own API and shell for performing functions like scan, get, put, list, and so forth, but more developers are familiar with SQL than NoSQL. The goal of Phoenix is to provide a commonly understood interface for HBase.
In terms of features, Phoenix does the following:
- Provides a JDBC driver for interacting with HBase.
- Supports much of the ANSI SQL standard.
- Supports DDL operations such as CREATE TABLE, DROP TABLE, and ALTER TABLE.
- Supports DML operations such as UPSERT and DELETE.
- Compiles SQL queries into native HBase scans and then maps the response to JDBC ResultSets.
- Supports versioned schemas.
In addition to supporting a vast set of SQL operations, Phoenix is also very high performing. It analyzes SQL queries, breaks them down into multiple HBase scans, and runs them in parallel, using the native API instead of MapReduce processes.
Phoenix uses two strategies--co-processors and custom filters--to bring computations closer to the data:
- Co-processors perform operations on the server, which minimizes client/server data transfer.
- Custom filters reduce the amount of data returned in a query response from the server, which further reduces the amount of transferred data. Custom filters are used in a few ways:
- When executing a query, a custom filter can be used to identify only the essential column families required to satisfy the search.
- A skip scan filter uses HBase's SEEK_NEXT_USING_HINT to quickly navigate from one record to the next, which speeds up point queries.
- A custom filter can "salt the data," meaning that it adds a hash byte at the beginning of row key so that it can quickly locate records.
In sum, Phoenix leverages direct access to HBase APIs, co-processors, and custom filters to give you millisecond-level performance for small datasets and second-level performance for humongous ones. Above all, Phoenix exposes these capabilities to developers via a familiar JDBC and SQL interface.
Get started with Phoenix
In order to use Phoenix, you need to download and install both HBase and Phoenix. You can find the Phoenix download page (and HBase compatibility notes) here.
Download and setup
At the time of this writing, the latest version of Phoenix is 4.6.0 and the download page reads that 4.x is compatible with HBase version 0.98.1+. For my example, I downloaded the latest version of Phoenix that is configured to work with HBase 1.1. You can find it in the folder:
Here's the setup:
- Download and decompress this archive and then use one of the recommended mirror pages here to download HBase. For instance, I selected a mirror, navigated into the 1.1.2 folder, and downloaded
- Decompress this file and create an
HBASE_HOMEenvironment variable that points to it; for example, I added the following to my
~/.bash_profilefile (on Mac):
Sign up for CIO Asia eNewsletters.