Drowning in data
Before doctors can use genetic data to diagnose and treat disease, they need to know exactly what the myriad different variations in the genome mean for sickness and health. And as the amount of genetic data expands exponentially, that problem is only getting bigger. Kaitlin Samocha, from the Wellcome Trust Sanger Institute, is trying to get to grips with the data deluge and sift the signal from the noise - as she explained to Ginny Smith...
Kaitlin - One of the big questions within genetics is being able to understand any genetic changes that we see in individuals, and what you need to be able to do to understand the change in one individual is to look at many, many individuals. And the reason this is important is because all of us have a lot of changes in our genome. And so, we really need to be able to filter out the changes that are important from the changes that are there because they're there, because you're human, because you have them.
There is a subset of changes within each individual that might be contributing to differences and their height or their weight. And we want to be able to determine which changes are contributing to the height difference and which changes aren't contributing. And so, in order to be able to understand that, it’s best to look at many people.
The work I've been doing is looking at many people to come up with ways to separate out the signal from the noise, the interesting changes from the not-so-interesting changes, if you want to summarise that way.
Ginny: And how are you applying this to healthcare?
Kaitlin - So, the application to healthcare is that we know genetic changes can influence your risk for disease or a particular disease that you have. If you look for example at children with developmental disorders such as severe intellectual disability, we know that there are genetic contributions to that.
What we want to be able to do is when we look at one of these children and we have all of their genetic changes, we want to be able to find the subset of changes that appear to be contributing to their developmental disorder as opposed to contributing for example to their height.
Ginny - So there’ll be some differences in the genome of someone with a disease that are causing the disease perhaps, but then there’ll be other differences that are just at random. Is that the idea?
Kaitlin - Yes, that’s the exactly the idea. There are plenty of changes within every individual that don’t have a major impact at all. They seem to be relatively neutral in terms of the way that the individual acts or operates overall.
Ginny - If you’ve got, say, a sample with a disease and you find a change in their DNA, how do you then work out if that is the change that’s causing a problem or just something that’s come up at random?
Kaitlin - That’s an excellent question. So you have an individual and you have all their variation. The best way to understand any changes you’ve seen in an individual is to look at a population. So, you can check, for example, if anyone else with a very similar disease has that exact same change. And that is one of the ways that you can say, “Well, if two people with the exact same disease have the exact same change, that seems interesting.”
Another way to do it is to look at relatively healthy people – the general population – and to say, “Let’s find areas that no one has a change in” because if you have this particular gene for example that makes a specific protein, if you look at most of the population, they don’t have any changes in this gene at all, but your patient has change, that indicates to you that most people seem to be intolerant of any change here where this change might be more likely to be leading to their disorder.
Ginny - But am I right in thinking that most diseases, it’s not as simple as a single gene that will have a change? It’ll be a combination of genes.
Kaitlin - Correct. That’s what makes it even harder – every individual for many things will have a variety of changes, multiple changes that are all contributing. And so, what we’re trying to do is help highlight the areas or the specific changes that appear to be more likely to be important. And so, you could do this across the board and if someone has five changes that are all never seen in a healthy person, those five changes might be working together in order to lead to the disorder. One of the main challenges today is not only to highlight changes that look interesting but to figure out how they're working together to lead to some of these different disease states.
Ginny - Now this sounds to me like it might be as much of a data issue as it is a biology issue. How do you handle the quantity of information you must have to deal with on a daily basis?
Kaitlin - We have some very nice computers! And actually, increasingly, people are moving to cloud-based systems. So instead of storing for example, all living individuals’ changes in some text file on your computer, we store it across many computers that you can access from many different places in the world and this allows researchers not only at the Sanger Institute where I'm based but researchers in Boston or in Germany, or variety of places in the world to all contribute understanding the different changes in how they work together.
Yes, this is becoming a computational issue and increasingly these days, we’re shifting away from looking at an individual person on their own and trying to solve one thing at a time. We’re looking at large scale data altogether to understand. So yes, programming and computer skills are increasingly important biology these days.
Ginny - And what kind of numbers are we talking about in these population studies?
Kaitlin - So a data set I've worked with a lot has 61,000 relatively healthy people. We’ve now released a data set that’s roughly 140,000 relatively healthy people. The UK Biobank which is a big study based of course in the UK has 500,000 people. So, as we continue to sequence or look at different genetic changes in individuals, it will scale up to hundreds of thousands and millions of people. And that’s what we need if we want to put single individuals in the context of millions.
Ginny - And how do you feel your work is going to apply to the future of healthcare in general?
Kaitlin - Many people have predicted that in the somewhat near future, every individual will have their genetic variation interrogated. So you might be walking around with knowledge about all of your genetic changes. The hope is to be able to educate clinicians so that they can help use this information in order to prioritise what you specifically have.
So we already know for example that there are some genetic changes associated with how well you process particular drugs. That’s incredibly important for your GP to know before they give you any particular drug. In the future, the hope is that the work that I've done will give the clinicians a way to prioritise what might be involved in your particular disease that you're seeing your physician or your clinician about generally. That’s always the hope!