Each time any one of the billion Facebook users visits the social networking site, the company's servers must assemble data -- user posts, likes, shares, images -- from hundreds or even thousands of different servers around the globe. The page must be created on the fly and within a few hundred milliseconds.
No simple task, but thus far, Facebook has only offered brief glimpses of how its servers execute this challenging operation. This week though, the company will offer an architectural overview of its data management and delivery infrastructure at the 2013 Usenix Annual Technical Conference, being held in San Jose, California.
Facebook engineer Mark Marchukov, who will be doing the presentation at Usenix on Wednesday, has also posted a blog entry with more details.
Because the structure -- and volume -- of the data that Facebook serves is so different from the sort typically handled by a commercial relational database, the company developed its own data store, called TAO ("The Associations and Objects"). Facebook describes TAO in the accompanying Usenix paper as "a geographically distributed, eventually consistent, graph store optimized for reads."
"Several years ago, Facebook relied entirely on an open-source stack -- Apache, MySQL, Memcache, PHP. We were very good at customizing open-source software to our needs," said Facebook engineering director Venkat Venkataramani in an interview. "But then we started thinking what a data store would look like that was built by Facebook for Facebook."
While Facebook has not released as open source any of the TAO code yet, the architectural details the company has provided could influence the development of new types of data stores and other software, in much the same way that company-published white papers on Amazon Dynamo and Google BigTable paved the way for a new generation of NoSQL databases.
The work shows the validity of the graph data model that Facebook relies on to make associations between people and events, as well as the power of distributed data management.
"Almost all enterprises work on a relational data model, but as we move to the cloud, the scalability challenges that a lot of enterprises will face in the future will be quite different than what the scenery looks like today. We may be just a little ahead of the curve there," Venkataramani said.
The TAO API (application programming interface) "makes the entire data store feel like one unified system, while on the back end, we are able to distribute it across a wide number of machines, data centers and even regions," Venkataramani said.
TAO has been in full-scale deployment at Facebook for about two years. During peak hours, TAO can process more than 1.6 billion reads per second and 3 million writes per second.
Sign up for CIO Asia eNewsletters.