Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Cassandra lowers the barriers to big data

Rick Grehan | March 25, 2014
Apache Cassandra is a free, open source NoSQL database designed to manage very large data sets (think petabytes) across large clusters of commodity servers. Among many distinguishing features, Cassandra excels at scaling writes as well as reads, and its "master-less" architecture makes creating and expanding clusters relatively straightforward. For organizations seeking a data store that can support rapid and massive growth, Cassandra should be high on the list of options to consider.

Installing Cassandra is reasonably straightforward, particularly if you download the DataStax Community edition, which bundles a Web-based management application called OpsCenter. I downloaded and installed the tarball version of Cassandra on my Ubuntu Linux system (the apt-get version for some reason refused to install) and found that the real work lies in configuring a Cassandra cluster. The configuration.yaml file holds scads of tunable parameters for the node and its cluster.

For example, you can set the number of tokens that will be assigned to the node, which controls the proportion of data (relative to other nodes) that the node will be responsible for. (This is useful if your cluster is composed of heterogeneous hardware because more powerful members can be configured to handle heavier loads.) Happily, for a small trial installation, you need only configure the listening IP address for the current node and the IP addresses of the cluster's seed nodes.

OpsCenter runs a server process on your management host that communicates with agent processes executing on the cluster's nodes. The agents gather usage and performance information and send it to the server, which provides a browser-based user interface for viewing the aggregated results. With OpsCenter, you can browse data, examine throughput graphs, manage column families, initiate cluster rebalancing, and so on. (As an aside, I was unable to get OpsCenter working successfully on my Linux installation. The DataStax Community Edition installation on Windows worked, but only partially, it being unable to connect to the agent service.)

While documentation — primarily in the form of FAQs, wikis, and blogs — exists on the Apache Cassandra site and the Planet Cassandra site, DataStax is the most comprehensive source for Cassandra documentation and tutorials. In fact, Planet Cassandra's Getting Started page more or less points you to the DataStax pages.

DataStax maintains documentation of both current and previous versions; as Cassandra is updated, you can troubleshoot any earlier installations you continue to run. The Web pages are well hyperlinked and provide plenty of diagrams. Along with video tutorials, you'll also find reference guides for Java and C# drivers, as well as developer blogs on Cassandra internals.

Until recently, Cassandra provided no transactional capabilities. However, the latest release of Cassandra (version 2.0) adds "lightweight transactions" that employ an atomic "compare and set" architecture. In CQL, this is manifested as a conditional IF clause on INSERT and UPDATE commands. The data is modified if a particular condition is true. You can imagine a CQL INSERT statement that will only add a new row if the row does not exist, and the presence of the transactional IF test will guarantee that the INSERT is atomic for the database.

 

Previous Page  1  2  3  4  5  Next Page 

Sign up for CIO Asia eNewsletters.