British researchers succeed in storing data on DNA

British researchers have discovered a way to store data in the form of DNA, potentially providing a commercially viable alternative to expensive hard disks and magnetic tape.

DNA is an extremely robust way to store data, as evidenced by the information extracted from wooly mammoth bones, which date back tens of thousands of years.

It is also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy, according to Nick Goldman from the European Bioinformatics Institute (EMBL-EBI) in Hinxton.

Although reading DNA is fairly straightforward, writing has always been a major hurdle to making DNA storage a reality, because it is only possible to manufacture DNA in short strings, and both writing and reading DNA are prone to errors – particularly when the same DNA letter is repeated.

The new method involves breaking up the code into lots of overlapping fragments, with indexing information showing where each fragment belongs in the overall code, and making a coding scheme that does not allow repeats.

In order for the code to fail, the same error would have to occur on four different fragments, which would be extremely rare.

The researchers have teamed up with California-based firm Agilent Technologies to synthesise hundreds of thousands of pieces of DNA from a number of encoded files.

Related:

The files included versions of an .mp3 of Martin Luther King’s speech, “I Have a Dream”; a .jpg photo of EMBL-EBI; a .pdf of Watson and Crick’s seminal paper, “Molecular structure of nucleic acids”; a .txt file of all of Shakespeare's sonnets; and a file that describes the encoding.

According to Emily Leproust of Agilent, the result looked like a tiny piece of dust. Agilent mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.

The researchers claim that at least 100 million hours of high-definition video can be stored in about a cup of DNA. The breaktrough could provide a solution to the data deluge in industries such as life sciences.

“We’ve created a code that's error tolerant using a molecular form we know will last in the right conditions for 10,000 years, or possibly longer,” said Goldman. “As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA.”

The next step for the researchers is to perfect the coding scheme and explore practical aspects, paving the way for a commercially viable DNA storage model.

The method was published in the journal Nature on 23 January 2013.