Using DNA to store big data
Using The genetic information kept at the European Bioinformatics Institute - or EBI -runs to six quadrillion bytes of data; for the geeks among you, that's six thousand million megabytes. Trying to solve this problem led researcher Nick Goldman to think of a novel solution that stores thirty thousand times more data per gram than conventional methods like hard discs - he's using DNA! Rosalind Davis went to find out how he does this, starting with a look at what's currently inside the Institute's current data centre...
Nick - It looks largely like an enormous number of quite compact computers all lined up in big racks, one above the other. Other data centres will use different systems depending on what the demand for their data is. So, for example, the CERN data system is very interesting. They use a combination of hard disks and magnetic tapes. The new information that's exciting for the scientists is kept on hard disks, after a while they move it to tape. Ours is all disk.
Rosalind - Can we go in?
Nick - Yes, absolutely. Follow me.
Rosalind - Oh Wow, it's getting loud now. Okay, so we're inside the data storage centre. What have we got in the different racks?
Nick - There's a variety of machines of different ages, different size disks.
Rosalind - But all the wiring going up to the ceiling where there are a lot of fans. It's a bit too noisy in here Nick, so I think we'll go back outside to continue the conversation.
Nick - Okay. So most of the noise you hear is air conditioning fans. There's a cool system where they blow cold air in down every other aisle and they suck the warmed air up in the intermediate aisle. So if you walk up and down the aisles, there's a cold aisle, a warm aisle, cold aisle, a warm aisle, and the put the computers facing back to back so that the air goes in at the front and always out at the back of each machine.
Rosalind - With these disks, how long do they last?
Nick - A typical data centre policy would be a three year maximum lifetime for a disk. After that amount of time you don't trust it any more so, even if it hasn't gone wrong yet, you'll be expecting to replace it.
Rosalind - Oh Wow. That's quite often. So have you go backups of all this data?
Nick - Yes. Modern disk systems are sort of automatically self-backing up, so each disk is being partly used for data and partly used for the backup of another disk, and all the information is shared across many disks. So in everyday use, if one disk goes wrong, there's no real impact on the system. A little light comes on somewhere and they swap that disc out and put an new one in. So to some extent this renewal is always going on but that doesn't reflect change in technologies so well, and so on a three or four year cycle they'll be completely replacing everything.
Rosalind - What's the kind of financial and, I guess, the environmental carbon cost of running a centre like this?
Nick - Financially, one of the biggest budget items for the EBI each year, is the cost of the computing equipment and the disks. That runs into millions of pound a year. And the cost of doing the air conditioning on a data centre is about the same as the cost of hardware. So, it's a very large amount of money and you can imagine yourself what the environmental impact of using that much energy would be.
Rosalind - You've looked into a novel way of storing data to avoid this problem?
Nick - Yes, so inspired by some of the issues we had with scaling up our genome data storage facility, we were joking one day about any other way there would be for storing information that wouldn't be so costly, and realised that the DNA itself is a fantastic medium for storing digital information.
Rosalind - So you're actually storing digital data from computers and things back on to DNA?
Nick - That's right. We devised an experiment to show that this was possible on a reasonably large scale.
Rosalind - I can imagine this is quite a complicated process. Can we go to the lab and have a look at how it works?
Nick - Yes, let's do that.
Rosalind - After entering the lab and putting on a disposable lab coat, I sat down with Nick next to a fridge full of test tubes to find out how he stores digital data on DNA.
Nick - We invented some algorithms and some codes which would start with a file on a computer, which essentially is zeros and ones and would convert that to a format that looks like fragments of DNA, letters A, C, G and T. And when we've made the designs for different fragments of DNA, we give those to a company, they're called Agelent, and they have the technology to make those fragments of DNA in large numbers, and large quantities of each fragment in their laboratories there, and they send them to us in test tubes ready for us to handle in the lab.
Rosalind - They almost look empty, but Nick you're telling me there's something in these vials?
Nick - There's a tiny drop of liquid somewhere in there, which is DNA in solution.
Rosalind - How much data can you put on DNA at the moment?
Nick - DNA is really, really tiny. It's sort of unthinkably small. In our experiments using a few megabytes of computer information, the actually quantity of DNA is essentially invisible. We've calculated if you were to use the same system to record all the information currently held on computers in the whole world, it would about one or two metres cubed.
Rosalind - Wow that's tiny. Do you get somebody else to make the DNA for you? Is it a really difficult process?
Nick - At the moment, the system they use is a bit like an inkjet printer, but it's more complicated and requires very high precision. and it's currently done in clean rooms in a dedicated laboratory. It's a process that's getting increasingly important in biomedical research, to have DNA made to designs the scientists want. So we are optimistic that that will get quicker and easier and cheaper, but at the moment it's still quite a specialised process.
Rosalind - Once you've got the data and it's in the test tube. How do you read it?
Nick - So we designed the whole system so it would fit right in with the standard technologies that are currently used for genome sequencing in biology and health care experiments.
Rosalind - What would you see the applications for this kind of storage being?
Nick - Well the first applications would be ones where people are prepared to spend a large amount of money. So that will be high value information, things that are culturally important or politically important. DNA will last hundreds or thousands of years without any intervention, so long as you keep it cool and dark. Genome scientists working in evolution, extracted DNA successfully from horses that died 700,000 years ago, and there's been some damage but they've been able to recover essentially the whole genome sequence, so we know DNA will last that long. That wasn't even a controlled experiment, that was just a dead horse. So we are thinking about applications that would be the long term archiving of high value information.