The researchers said they developed "a novel approach" to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences -- adenine, guanine, cytosine and thymine -- represented as As, Gs, Cs and Ts.
The digital data is broken down into pieces and stored by synthesizing it as a massive number of tiny DNA molecules, which can be dehydrated and preserved for long-term storage.
While advances in DNA storage rely on techniques pioneered by the biotechnology industry, it also requires lessons learned from information technology. For example, the Microsoft and UW team's encoding approach uses error correction schemes commonly used in computer memory.
"This is an example where we're borrowing something from nature -- DNA -- to store information. But we're using something we know from computers -- how to correct memory errors -- and applying that back to nature," said Luis Henrique Ceze, a UW associate professor of computer science and engineering and the university's principal researcher on the project.
To access the stored data, the researchers encode the equivalent of zip codes and street addresses into the DNA sequences. Polymerase Chain Reaction (PCR) techniques -- commonly used in molecular biology -- help them more easily identify the zip codes they are looking for.
Using DNA sequencing techniques, the researchers can then read the data and convert it back to a video, image or document file by using the street addresses to reorder the data.
Most of the world's data today is stored on magnetic and optical media. Tape technology has recently seen significant density improvements with tape cartridges as large as 185TB, and is the densest form of storage available commercially today, at about 10GB per millimeter (mm). Recent research reported feasibility of optical discs capable of storing 1PB, yielding a density of about 100GB/mm. Despite this improvement, storing zettabytes of data would still take millions of units, and use significant physical space.
National Human Genome Research Institute
A depiction of a DNA double helix.
DNA has a theoretical limit above one exabyte per millimeter, which is eight orders of magnitude denser than tape. DNA-based storage also has the benefit of eternal relevance: As long as there is DNA-based life, there will be strong reasons to read and manipulate DNA, the researchers stated in an April research paper.
According to the ongoing "Digital Universe" study by IDC and EMC, the amount of data is forecast to grow to over 16 zettabytes (ZB) in 2017. The Internet of Things, in large part, will be responsible for doubling digital data every two years, resulting in 44 trillion gigabytes (44ZB) by 2020.
Sign up for CIO Asia eNewsletters.