Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Facebook open-source cache squeezes more from flash disks

Joab Jackson | Oct. 10, 2013
Facebook's homebuilt Flashcache can reduce wear and tear on SSDs while boosting data throughput to users

Although designed to work with MySQL and the MySQL InnoDB database storage engine, Flashcache can be used as a general caching mechanism for Linux systems.

Flashcache can also speed times it takes to write data to disk, from the user's perspective, by saving newly updated data on SSD first and then writing it to the hard drives later.

The updated Flashcache module improves performance in read-write distribution, cache eviction and write efficiency.

Analyzing Flashcache performance, Facebook had found that most of its caches have a small subset of data that is read much more frequently than most of the other data.

With the previous version of Flashcache, 50 percent of a cache's contents accounted for 80 percent of disk operations. Such a concentration of frequently consulted material could cause performance bottlenecks.

To improve Flashcache's read-write distribution, the engineers developed a number of techniques to automatically position the data so that cache reads are distributed more evenly across the SSD. Now 50 percent of the cache accounts for 50 percent of the disk operations.

To improve the process of determining which data to move off the cache, a process called cache eviction, Flashcache switched from using the FIFO (first in first out) algorithm--in which the oldest data in the cache is removed first to make room for new data--to a LRU (least recently used) algorithm, which discards the data that hasn't been requested for the longest period of time.

Improvements were also made in write efficiency.

Previously the software would write to disk only when it had a certain amount of data that was ready to be written. This resulted in uneven performance across different caches, however. So, Facebook engineers developed an approach that would write the cached data to disk whenever a copy of that data was requested by a user, which resulted in a smoother flow of write operations.

Thanks to these improvements, the updated caching mechanism has an average hit rate -- or information that is requested by users that resides in cache--of 80 percent, up from 60 percent in the previous version. This means more data is served more quickly.

Updating the software has also slashed server I/O (input/output) required to read data by 40 percent, and reduced the I/O required to write data by 75 percent. For a company that is running thousands of servers, such a reduction in traffic can help make more efficient use of servers and keep hardware costs manageable.

 

 

Previous Page  1  2 

Sign up for CIO Asia eNewsletters.