Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Cassandra lowers the barriers to big data

Rick Grehan | March 25, 2014
Apache Cassandra is a free, open source NoSQL database designed to manage very large data sets (think petabytes) across large clusters of commodity servers. Among many distinguishing features, Cassandra excels at scaling writes as well as reads, and its "master-less" architecture makes creating and expanding clusters relatively straightforward. For organizations seeking a data store that can support rapid and massive growth, Cassandra should be high on the list of options to consider.

Cassandra 2.0 also improves response performance with "eager retries." If a given replica is slow to respond to a read request, Cassandra will send that request to other replicas if there's a chance the other replicas might respond prior to the request timeout. With version 2.0, Cassandra now handles the removal of stale index entries "lazily." In the past, stale entries were cleaned up immediately, which required a synchronization lock. The new technique avoids the throughput-constricting lock.

While Cassandra is a complicated system, its symmetrical treatment of cluster nodes makes it surprisingly easy to get up and running. The SQL-like nature of CQL is a great benefit, making it quicker and easier for developers moving from RDBMS environments to become productive.

Nevertheless, the learning curve for Cassandra is significant. It's a good idea to set up a small to modest development cluster and do plenty of experimenting, particularly with your data schema and configuration parameters. Performance issues can become significant as the application scales up.


Previous Page  1  2  3  4  5 

Sign up for CIO Asia eNewsletters.