Working with HBase
HDFS was designed on the principle that it is easier to move computation (as in a MapReduce operation) close to the data being processed than it is to move the data close to the computation. As a result, it is not in HDFS's nature to ensure that related pieces of data (say, rows in a database) are co-located. This means it's possible that a block whose data is managed by a particular RegionServer will not be stored on the same physical host as that RegionServer. However, HDFS provides mechanisms that advertise block location and — more important — perform block relocation upon request. HBase uses these mechanisms to move blocks so that that they are local to their owning RegionServer.
While HBase does not support transactions, neither is it eventually consistent; rather, HBase supports strong consistency, at least at the level of a single row. HBase has no sense of data types; everything is stored as an array of bytes. However, HBase does define a special "counter" datatype, which provides for an atomic increment operation — useful for counting views of a Web page, for example. You can increment any number of counters within a single row via a single call, and without having to lock the row. Note that counters will be synchronized for write operations (multiple writes will always execute consistent increments) but not necessarily for read operations.
The HBase shell is actually a modified, interactive Ruby shell running in JRuby, with Ruby executing in a Java VM. Anything you can do in the interactive Ruby shell you can do in the HBase shell, which means the HBase shell can be a powerful scripting environment.
The latest version of the shell provides a sort of object-oriented interface for manipulating HBase tables. You can, for example, assign a table to a JRuby variable, then issue a method on the table object using the standard dot notation. For example, if you've defined a table and assigned it to the myTable variable, you could write (put) data to the table with something like:
myTable.put '<row>', '<col>', '<v>'
This would write the value <v> into the row <row> at column <col>.
There are some third-party management GUIs for HBase, such as hbase-explorer. HBase itself includes some built-in Web-based monitoring tools. An HBase master node serves a Web interface on port 60010. Browse to it, and you'll find information about the master node itself including start time, the current Zookeeper port, a list of region servers, the average number of regions per region servers, and so on. A list of tables is also provided. Click on a table and you're shown information such as the region servers that are hosting the table's components. This page also provides controls for initiating a compaction on the table or splitting the table's regions.
Sign up for CIO Asia eNewsletters.