Nick Goldman, European Bioinformatics Institute
Using The genetic information kept at the European Bioinformatics Institute - or EBI -runs to six quadrillion bytes of data; for the geeks among you, thatís six thousand million megabytes. Trying to solve this problem led researcher Nick Goldman to think of a novel solution that stores thirty thousand times more data per gram than conventional methods like hard discs - he's using DNA! Rosalind Davis went to find out how he does this, starting with a look at whatís currently inside the Instituteís current data centre...
Nick - It looks largely like an enormous number of quite compact computers all lined up in big racks, one above the other. Other data centres will use different systems depending on what the demand for their data is. So, for example, the CERN data system is very interesting. They use a combination of hard disks and magnetic tapes. The new information thatís exciting for the scientists is kept on hard disks, after a while they move it to tape. Ours is all disk.
Rosalind - Can we go in?
Nick - Yes, absolutely. Follow me.
Rosalind - Oh Wow, itís getting loud now. Okay, so weíre inside the data storage centre. What have we got in the different racks?
Nick - Thereís a variety of machines of different ages, different size disks.
Rosalind - But all the wiring going up to the ceiling where there are a lot of fans. Itís a bit too noisy in here Nick, so I think weíll go back outside to continue the conversation.
Nick - Okay. So most of the noise you hear is air conditioning fans. Thereís a cool system where they blow cold air in down every other aisle and they suck the warmed air up in the intermediate aisle. So if you walk up and down the aisles, thereís a cold aisle, a warm aisle, cold aisle, a warm aisle, and the put the computers facing back to back so that the air goes in at the front and always out at the back of each machine.
Rosalind - With these disks, how long do they last?
Nick - A typical data centre policy would be a three year maximum lifetime for a disk. After that amount of time you donít trust it any more so, even if it hasnít gone wrong yet, youíll be expecting to replace it.
Rosalind - Oh Wow. Thatís quite often. So have you go backups of all this data?
Nick - Yes. Modern disk systems are sort of automatically self-backing up, so each disk is being partly used for data and partly used for the backup of another disk, and all the information is shared across many disks. So in everyday use, if one disk goes wrong, thereís no real impact on the system. A little light comes on somewhere and they swap that disc out and put an new one in. So to some extent this renewal is always going on but that doesnít reflect change in technologies so well, and so on a three or four year cycle theyíll be completely replacing everything.
Rosalind - Whatís the kind of financial and, I guess, the environmental carbon cost of running a centre like this?
Nick - Financially, one of the biggest budget items for the EBI each year, is the cost of the computing equipment and the disks. That runs into millions of pound a year. And the cost of doing the air conditioning on a data centre is about the same as the cost of hardware. So, itís a very large amount of money and you can imagine yourself what the environmental impact of using that much energy would be.
Rosalind - Youíve looked into a novel way of storing data to avoid this problem?
Nick - Yes, so inspired by some of the issues we had with scaling up our genome data storage facility, we were joking one day about any other way there would be for storing information that wouldnít be so costly, and realised that the DNA itself is a fantastic medium for storing digital information.
Rosalind - So youíre actually storing digital data from computers and things back on to DNA?
Nick - Thatís right. We devised an experiment to show that this was possible on a reasonably large scale.
Rosalind - I can imagine this is quite a complicated process. Can we go to the lab and have a look at how it works?
Nick - Yes, letís do that.
Rosalind - After entering the lab and putting on a disposable lab coat, I sat down with Nick next to a fridge full of test tubes to find out how he stores digital data on DNA.
Nick - We invented some algorithms and some codes which would start with a file on a computer, which essentially is zeros and ones and would convert that to a format that looks like fragments of DNA, letters A, C, G and T. And when weíve made the designs for different fragments of DNA, we give those to a company, theyíre called Agelent, and they have the technology to make those fragments of DNA in large numbers, and large quantities of each fragment in their laboratories there, and they send them to us in test tubes ready for us to handle in the lab.
Rosalind - They almost look empty, but Nick youíre telling me thereís something in these vials?
Nick - Thereís a tiny drop of liquid somewhere in there, which is DNA in solution.
Rosalind - How much data can you put on DNA at the moment?
Nick - DNA is really, really tiny. Itís sort of unthinkably small. In our experiments using a few megabytes of computer information, the actually quantity of DNA is essentially invisible. Weíve calculated if you were to use the same system to record all the information currently held on computers in the whole world, it would about one or two metres cubed.
Rosalind - Wow thatís tiny. Do you get somebody else to make the DNA for you? Is it a really difficult process?
Nick - At the moment, the system they use is a bit like an inkjet printer, but itís more complicated and requires very high precision. and itís currently done in clean rooms in a dedicated laboratory. Itís a process thatís getting increasingly important in biomedical research, to have DNA made to designs the scientists want. So we are optimistic that that will get quicker and easier and cheaper, but at the moment itís still quite a specialised process.
Rosalind - Once youíve got the data and itís in the test tube. How do you read it?
Nick - So we designed the whole system so it would fit right in with the standard technologies that are currently used for genome sequencing in biology and health care experiments.
Rosalind - What would you see the applications for this kind of storage being?
Nick - Well the first applications would be ones where people are prepared to spend a large amount of money. So that will be high value information, things that are culturally important or politically important. DNA will last hundreds or thousands of years without any intervention, so long as you keep it cool and dark. Genome scientists working in evolution, extracted DNA successfully from horses that died 700,000 years ago, and thereís been some damage but theyíve been able to recover essentially the whole genome sequence, so we know DNA will last that long. That wasnít even a controlled experiment, that was just a dead horse. So we are thinking about applications that would be the long term archiving of high value information.