Movie, malware and Amazon gift card data stored as DNA, and retrieved

The world is generating data at a rate that is rising exponentially. In fact, the human population has produced more electronic information in the last 5 years alone than we did...
03 March 2017

Share

The world is generating data at a rate that is rising exponentially. In fact, the human population has produced more electronic information in the last 5 years alone than we did during the rest of mankind's existence.

This data deluge is causing a big headache: where to put all this information and how to keep it safe - and readable - in the long term?

One possible answer is to use the information storage molecule that our own bodies rely upon: DNA. It's compact, packs a lot of data into a very small space and, given that DNA sequences can still be decoded from half-million year old fossils, it's safe to say it has a proven track record of resilience and durability.

This week a new method for storing data in DNA form at a density hundreds of times higher than previously achieved has been developed by scientists in the US.

Writing in Science, Columbia University's Yaniv Erlich dubbs his method a "DNA fountain" and has used it successfully to store - and faithfully retrieve - about 2 megabytes of data, including a computer operating system, a movie and an Amazon gift card. He also included some malware for good measure. 

The storage density achieved by the new system is hundreds of times superior to earlier DNA storage efforts and achieved perfect data integrity, which has been a stumbling block for other methods.

It works by first converting all of the data into a binary sequence of zeros and ones.  These are then divided into a series of short, equal-length segments - in this case 67,088 binary chunks. A small, random selection of them are then added together and a short "seed" code corresponding to which chunks have been added is appended to the front.

The resulting binary sequence is then converted into a string of DNA letters corresponding to the zeros and ones in the binary sequence. So a pair of zeros is expressed as a DNA letter A (adenine), 01 is a letter C (cytosine), 10 is a G (guanine) and 11 is T (thymine). A redundancy of 7% was built into the system, meaning that 7% of the data was recorded more than once, making it more resilient to errors.

These DNA sequences were synthesised and then "read back" using standard DNA decoding techniques. A computer programme then converted the DNA code back into the source binary data, without any errors. The data density achieved was 215 petabytes (thousands of terabytes) per gram, which is orders of magnitude better than a hard disc and close to the theoretical limit of what we think can be achieved with DNA storage.

The downside is that it took nearly 10 minutes to turn the DNA data back into binary, and the overall storage cost worked out at more than 3500 Dollars per megabyte. Putting that into perspective, you can pick up a 3 terabyte hard disk from a hardware store for about 100 Pounds at the moment.  But, made more cost effective by the advance of technology, "DNA might become an economically viable solution for long-term, high-latency storage," Erlich speculates.

Comments

Add a comment