Apache Cassandra is a free, open source NoSQL database designed to manage very large data sets (think petabytes) across large clusters of commodity servers. Among many distinguishing features, Cassandra excels at scaling writes as well as reads, and its "master-less" architecture makes creating and expanding clusters relatively straightforward. For organizations seeking a data store that can support rapid and massive growth, Cassandra should be high on the list of options to consider.
Cassandra comes from an auspicious lineage. It was influenced not only by Google's Bigtable, from which it inherits its data architecture, but also Amazon's Dynamo, from which it borrows its distribution mechanisms. Like Dynamo, nodes in a Cassandra cluster are completely symmetrical, all having identical responsibilities. Cassandra also employs Dynamo-style consistent hashing to partition and replicate data. (Dynamo is Amazon's highly available key-value storage system, on which DynamoDB is based.)
How Cassandra works
Cassandra's impressive hierarchy of caching mechanisms and carefully orchestrated disk I/O ensures speed and data safety. Its storage architecture is similar to a log-structured merge tree: Write operations are sent first to a persistent commit log (ensuring a durable write), then to a write-back cache called a memtable. When the memtable fills, it is flushed to an SSTable (sorted string table) on disk. All disk writes are appends — large sequential writes, not random writes — and therefore very efficient. Periodically, the SSTable files are merged and compacted.
A Cassandra cluster is organized as a ring, and it uses a partitioning strategy to distribute data evenly. The preferred partitioner is the RandomPartitioner, which generates a 128-bit consistent hash to determine data placement. The partitioner is assisted by another component called a "snitch," which maps between a node's IP address and its physical location in a rack or data center.
When Cassandra writes data, that data is written to multiple nodes so that it remains available in the event of node failure. The nodes to which a given data element is written are called "replica nodes." Cassandra uses the snitch to ensure that the replica nodes for any particular piece of information are not in the same rack. Otherwise, if the rack were to fail, the data element and all its replica copies would be lost.
Should one or more nodes in a cluster become overutilized, Cassandra rebalances the cluster with the aid of "virtual nodes," or vnodes. An entirely logical construct, a vnode is essentially a container for a range of database rows. Because each physical node is assigned multiple vnodes, Cassandra can rebalance simply by moving a virtual node from an overloaded cluster member to less burdened members. Using virtual nodes makes load balancing more efficient because it allows Cassandra to rebalance by moving small amounts of data from multiple sources to a destination.
Sign up for CIO Asia eNewsletters.