YouGov has turned to MongoDB's NoSQL database to handle masses of survey data, enabling the company to quickly scale its operations as it expands internationally.
The online research firm collects survey information from millions of internet users worldwide, before processing it to provide market insights for corporate clients. Each day this involves adding an additional 50GB of information to its data centres in London and Palo Alto.
In 2010 YouGov began building a single database system, named Gryphon, in order to create a central repository of all of its interview data. It chose MongoDB's non-relational database technology as it was able to manage larger volumes of data in one place, which was not possible with its in-house platform, FastStore.
"The technology that underlies the interviewing systems was originally deployed in the US," said Jason Coombes, technical director at YouGov. "As we were expanding not only in volume in the US, but also bringing that technology to the European survey systems, a faster database was absolutely needed.
"We wouldn't have been able to do that without MongoDB."
He continued: "The performance of our in-house database, which was FastStore, just would not have handled that volume. And it certainly wouldn't have had the administrative advantages that MongoDB does."
While YouGov's business has been predominantly focused on the US and EMEA, it is looking to expand into new areas such as Asia Pacific. Previously it had databases in each geographic region. With MongoDB YouGov was able to create a unified database cluster that is easily accessible and relevant for each region.
"In the same way that Amazon S3 is conceptually a single database in the cloud, you actually have storage in different regions," said Coombs, "and we can do the same thing with MongoDB. Our users in the US will get an experience with US-based shards of our database, and users in EMEA or the Middle East or Asia Pacific, where we are expanding now, will also get a local low-latency response."
One of the major benefits is the ability to handle spikes in demand on databases more easily.
"We want to be able to add arbitrary amounts of data and to handle ten-fold surges in activity," said Coombs. "We have had situations where have had ten times the number of respondents coming in and if we ever have problems, it is with systems that aren't associated with MongoDB. The systems that are running in MongoDB can surge and handle it readily."
He added: "As we invest more in sharding we expect to be able to do that even better, such that if we had a hundred-fold increase we would know what to do with the database to be able to handle that. We could scale out our applications, scale out our database servers and, with confidence know that we would be able to handle that surge in activity."
Sign up for CIO Asia eNewsletters.