Much like the ongoing rumblings in the consumer market over the relative merits of rivals iOS and Android, the SQL/NoSQL face-off occupying the world's database developers, (and giving rise to countless blogs and online forums) shows no sign of abating. In the past few years, there has been increasing interest in the topic and the recent buzz about sentiment trading has only added fuel to the NoSQL fire.
Of course, although the topic is often labelled as SQL/NoSQL, the real underlying issue is the comparison between relational and non-relational databases. Indeed, we could even class it as traditional databases versus new databases such as Cassandra and MongoDB. It has earned this title because SQL is the lingua franca of relational databases, but is little used in non-relational databases.
In financial markets, the primary requirement of the database is for storing and analysing very high volumes of financial market data, with best possible performance. Therefore, the key to identifying the best possible database for your firm's needs is to establish the relative importance of performance and cost.
Database performance is critical to algorithmic trading, risk management, complex event processing, compliance and reference data management. Financial market databases typically have simple structures that are already well handled by the tabular format of traditional SQL databases. Where there are hundreds of millions of records that follow a relatively standard format, it is possible to make certain assumptions that can give better performance by orders of magnitude. In this case, the NoSQL databases offer no advantage, and not only will they have higher latency and comparatively poor performance, but they may also lack the required features such as support for specialised data types.
One obvious consideration is how the data itself may be stored. For example, join and query performance depends on being able to access values in specific columns very efficiently, so is much better when the underlying data is stored by column rather than by record. This benefit extends under the covers: corporate adjustments can be applied on the fly as data is read in, there is no need for special indices or complex query optimisers, and since columns usually have identical data type throughout, they can be very efficiently compressed. In contrast, key-value store is doubly slow - not only is data stored by record, but each record itself may have a complicated structure.
Also, in the typical database use for market data and applications such as network monitoring, a single server machine could handle an entire database. For example, the NYSE TAQ feed currently has around a billion records per day (down from a peak of more than two billion). This can easily be handled by a single machine, and indeed the entire day's feed can be stored in memory. This is the preferred setup, since it means that the machine can respond immediately to queries, instead of having to farm out queries over a network.
Sign up for CIO Asia eNewsletters.