Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Big data showdown: Cassandra vs. HBase

Rick Grehan | April 3, 2014
Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store

HBase has also introduced "coprocessors," which allow the execution of user code in the context of the HBase processes. The result is roughly comparable to the relational database world's triggers and stored procedures. Cassandra currently has no counterpart to HBase's coprocessors.

Cassandra's documentation is noticeably better than HBase's, and good documentation certainly flattens the learning curve. In my experience, setting up a development Cassandra cluster is simpler than setting up an HBase cluster. Of course, this is only important for development and testing purposes.

An HBase master node hosts a Web interface on port 60010. Here you can browse information such as the node's execution history, tables managed by the node, and region servers in the master's domain.

The win column
The real work appears when you must tune a cluster for your particular application. Given the size of the data sets involved and the complexity of building and managing a multinode cluster (that often spans multiple data centers), tuning is hardly straightforward. It demands a solid understanding of the interplay of the cluster's memory caching, disk storage, and internode communications, and it requires careful monitoring of cluster behavior.

It's true that HBase's reliance on Zookeeper — a separate application — introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids. But it isn't the case that tuning a Cassandra cluster is orders of magnitude less difficult. In the end, comparing the travails of cluster tuning of both databases, it's probably a wash.

Which means, as usual, there is no clear winner or loser. You'll find zealots for both databases, and each camp will present compelling evidence demonstrating the superiority of their system. And as usual, you'll face the chore of taking each for a test drive and benchmarking them against your target application. But given the scope of these technologies, could there be any other way?

  • Symmetric architecture makes it relatively easy to create and scale large clusters
  • SQL-like Cassandra Query Language eases developers' transition from RDBMS
  • Allows you to tune for performance or consistency or a balance of both
  • Community edition of management GUI available
  • Good documentation (provided by Datastax)
  • Built-in versioning
  • Strong consistency at the record level
  • Provides RDBMS-like triggers and stored procedures through coprocessors
  • Built on tried-and-true Hadoop technologies
  • Active development community
  • Configuration is complex
  • Current trigger/stored procedure mechanism experimental
  • Management GUI difficult to get up and running
  • Lacks friendly, SQL-like query language
  • Lots of moving parts
  • Setup beyond a single-node development cluster can be difficult

 

Previous Page  1  2  3  4 

Sign up for CIO Asia eNewsletters.