Synthetic DNA for data storage

DNA is a robust and dense alternative to storing digital data
18 March 2022


DNA with binary data (ones and zeros)


Synthetic DNA can now store data, a new study has shown...

Every day, trillions of megabytes of data are being generated. That is nearly two million years' worth of MP3 music being generated daily. Inevitably, all this data needs to be stored.

Current forms of data storage, such as hard drives or CDs/DVDs, have a limited amount of data stored in a given space. But with more and more data being created, higher density data storage methods are required.

One option is DNA, which stores data at a density of over 1,000 times that achievable with a hard drive. DNA is also incredibly robust and can remain readable for far longer than the lifetime of the average hard disk drive.

A strand of DNA is similar to a long chain, where each link in the DNA chain is called a nucleotide. The specific sequence of nucleotides can then be used to encode bits of data. Each bit is either a one or a zero.

As Olgica Milenkovic explains, “with the four naturally occurring nucleotides, A, T, C, G, you can store two bits per nucleotide, (0 0) for A, (0 1) for T, (1 0) for G, and (1 1) for C.” So a string of DNA reading "AGCTA" translates to the data sequence "0 0 1 0 1 1 0 1 0 0".

But if eight types of nucleotides are used, 50% more information - three bits - can be stored per nucleotide (0 0 0), (0 0 1), (0 1 0), (0 1 1), (1 0 0), (1 0 1), (1 1 0), and (1 1 1). This means the same amount of data can be stored in a shorter string DNA.

Now Milenkovic has added seven synthetic nucleotides alongside the four naturally-occurring ones, giving a genetic alphabet comprising eleven DNA letters rather than just four. This roughly doubles the DNA data density.

But once data is stored, it needs to be accessible and read. To do this, a technique called nanopore sequencing can be used. The DNA is fed through a small hole, like pearls on a necklace being pulled through a rubber sheet. As each pearl "pops" through, it produces a characteristic output signal corresponding to that genetic letter.

Though typically used to read natural DNA, the nanopore technique can also able to read the new synthetic DNA letters. 

The current challenge with DNA data storage is the amount of time required to write in the data. Putting together an exact sequence of DNA can take hours. So it solves some of our data storage problems, but there's still work to be done!


Given the arbitrary nature of computer data, what happens when (not if) a new virus DNA sequence is created because someone stored new data? Are you proposing that the storage device is quarantined? For example Covid-19 has only 29900 base pairs in its genome, you could construct that "on your hard drive", or experiment with worse.

In isolation, a genetic sequence is not harmful. The smallpox genome is decoded and stored after all. Only if that genetic sequence were placed in a situation where the instructions it encodes could be translated into a living entitity would there be a problem.

Add a comment