Keeping Your Genome Safe

Sequencing your genes may mean better treatment but what happens to your genome once it has been sequenced? How is it stored?
18 August 2014

Interview with 

Anna Middleton and Guy Coates, Sanger Institute


Data Room at the Sanger Institute


Sequencing your genes may mean better diagnosis and even treatment but what Data Room at the Sanger Institutehappens to your genome once it has been sequenced? Where does all this data go and how is it stored? And most importantly, is it safe? Back at the Sanger Institute, Graihagh Jackson spoke to Dr Anna Middleton and Dr Guy Coates - one of the people charged with protecting the thousands of genes sequenced at the facility.

Graihagh - So, we're in the middle of the data storage room here.  we're surrounded by aircon and all sorts of flashing lights and brightly coloured cables.  What's the function of this room?

Guy -   The science that we do is all about big data.  What we want to do is sequence many, many different genomes of many different people and then analyse them.  In order to do that, we need a lot of storage.

Graihagh -   How much data are we talking about here?

Guy -   We store about 20 Petabytes of data.  So, if you think about your hard disk in your laptop which is about a terabyte, a petabyte is a thousand terabytes.  So, we're roughly holding about 20,000 times the amount of data you have on your machine at home.

Graihagh -   What does that translate into?

Guy -   Typical genome is probably of the order of 3 gigabytes, so it's many tens of thousands of genomes.

Graihagh - How do you keep this secure?

Guy -   The Sanger Institute actually has a very open data policy.  So, most of the sequencing data that is generated here is made available on the internet for other researchers to be able to download.

Graihagh -   Does that mean anyone can have access and pinpoint someone's sequence or genome?

Guy -   In principle, if the data has been consented for release then yes.  Anyone can come along to our website and download as much of the data as they can handle.  We do have some data search which involve confidential patient information for researchers who wants to get access to that data have to come and submit proposals to an ethics review board who will decide.

Graihagh -   Has there ever been any incidences where data has accidentally been leaked or hacked?

Guy -   Not from us.  There have been incidences where people have taken what looks like anonymised data then been able to combine that with third party datasets.  One of the classic examples has been a study where someone has been able in the US to take genetic information which wasn't even a full genome scan and then link that back in some circumstances to surnames.  Now, that's not complete re-identification but it shows how careful that we have to be with genetic data.

Graihagh -   So, if your genes can be traced back to your family, what could the outside world learn about you?  Dr. Anna Middleton, Senior Scientist at the Sanger Institute and Gene Ethics expert...

Anna - So, you can tell somebody's past, present, future from their genes varying from the age that they would've been predisposed about their first tooth through to whether they're predisposed to high cholesterol, through to whether their higher chance and average of getting cancer.  So, a whole collection of different things.

Graihagh -   It sounds like a never ending list of possibilities.

Anna -   Yeah.  Actually, virtually, everything about us has some pathway in us that links to our genes.  So, together with the environment, genes make us who we are.

Graihagh -   You can potentially glean quite a lot of information about someone.  So, how is that kept anonymous?

Anna -   As far as identifying an individual goes, that would actually be very difficult.  If you manage to get somebody sequenced and you had their raw A, Cs, Ts, and Gs, the bits that make up the sequence.  You couldn't look at that and go, "Oh, I've just identified that person from that."  You can't do that.  You need a way of interpreting it.  If that individual had other things online that identified them such as their photographs, such as links to a particular condition, you could in theory, match them altogether, but it would be quite a complex exercise.  So, what we do is to try and get the data we have on campus as safe as possible.  We use the same sort of systems that banks do to protect identifiable data and we're just as cautious as we can be.

Graihagh -   Your genome may just about be safe from prying eyes for now, but what happens if you consent for someone to look at a specific gene and they accidentally see another, one that's life threatening but curable?

Anna -   In a research setting, if incedental findings are discovered there isn't a duty to be sharing those.  Now, that's actually quite different from a clinical setting.  So, if you had an x-ray of your lung and then picked up an unexpected rib fracture then you would expect the doctor to explore that and share that.  The same in genomics is expected.  If you genuinely saw something accidental that you weren't expecting then there would be an expectation to share that, particularly if it was very serious and potentially life threatening, and also, actionable.  So, if we pick up genes relating to even serious conditions, if you can do nothing about it then really, what's the point in knowing?

Graihagh -   So, if it's incurable, that's not something that would be communicated to a patient.

Anna -   So for example, if you were looking at say, a 2-year-old with a severe developmental disorder say, you would be looking to try and find the genetics behind that developmental disorder.  So, how helpful is it to then be looking at the Alzheimer genes that wouldn't even be relevant until that child was an adult.

Graihagh -   If you're looking at sequencing the gene, is it a case of just the gene being, this means you have Alzheimer's?

Anna -   So, the vast majority of genes that we have, give some level of prediction about the future, but it's very difficult to be precise.  So, when you're discovering things, incidental findings say, in a research setting or even in a clinical setting, it's very hard to actually interpret them without clinical data attached to them.

Graihagh -   Is there a standard policy across the world that people can look to that can follow, should they find themselves in these circumstances?

Anna -   Generically, if we were to look across the world, what are people doing, I'd say in the UK and in Europe more generally, we're being more conservative about incidental findings and we're saying, "Okay, the information that you can get from a whole sequence is vast, you know, what to look at and share."  Is it just information relating to serious actionable conditions?  What about information relating to pregnancy or carrier testing?  What about response to medications?  All these different genes play a part in these different things.  At the moment, we're saying it's actually very hard to manage all of that.  What we'll do is really just focus on answering the clinical question.  So, if somebody comes in with breast cancer, we'll really only just look at the breast cancer genes and to find out what chemotherapies are going to be most helpful for them and look at trying to support them with their cancer.  Let's not think about all these other pieces of information perhaps until another date.  That's very different from what's going on in the states at the moment.  There, they're recommending every time a sequence is done to automatically look for 24 cancer and cardiac conditions at the same time every time a sequence is done.  So, they're using that opportunity to do a screen of other things at the same time and that gives a very strong public health message.  So, it's just a different kind of approach.  So, we're more conservative here.  it may well be we go to the American position at some point, but we just don't quite feel ready yet.

Graihagh -   When are we likely to be able to see this sort of wide scale access to your genomic sequence?

Anna -   The 100,000 genomes project has started.  So, this is this massive government initiative to sequence 100,000 people in the NHS, so it wouldn't be available for everybody.  But the government has really made a massive commitment to create the infrastructure to support sequencing on a large scale.


Add a comment