Hundreds and Thousands

To figure out which genes are linked to diseases, researchers have to go large.
14 September 2015
Presented by Kat Arney


When it comes to figuring out which genes and genetic variations are linked to particular traits and diseases, there's only one way to do it, and that's to go large, with cohort studies involving hundreds or even thousands of volunteers. We meet the Born In Bradford bunch, a Canadian cohort, and more than a few pairs of twins. Plus, oh my God, they killed our gene of the month!

In this episode


01:06 - John Wright - Born in Bradford

Professor John Wright leads the Born in Bradford study, involving thousands of families across the city, aiming to improve their health.

John Wright - Born in Bradford
with John Wright, Born in Bradford

Kat - This month I've been to the British Science Festival in Bradford - home of the Born in Bradford cohort study. Five years ago a team of researchers in the city embarked on an ambitious collaborative project with scientists in the UK, Pakistan and elsewhere - to study as many babies born in the city as possible, looking at their health and educational outcomes as well as their genetics. I went along to their annual meeting, held at the National Media Museum in the centre of town to find out more about the study and some of the results that are coming out of it. First, I spoke to the director of the whole study, Professor John Wright, and started by asking him what exactly is a cohort study?

John - People will know them from television series like 7 Up and Child of our Time, where we choose a group of people - in this case, it's a birth cohort study so we're choosing children born in fact in pregnancy. We follow them up and we try and identify which children fall ill and which children stay healthy. And then by collecting the data in early life, we can unpick what causes those illnesses.

Kat - So tell me a bit about the Born in Bradford Study? Who are these lucky kids?

John - We started recruiting children in 2007 and carried on recruiting until about 2011. We've recruited 12,500 mothers, about 13,500 children and about 4,000 dads. There are more children than mothers because mothers have more than one child. About half our children are of Pakistani - south Asian mostly Pakistani origin - and about half are white British origin.

Kat - Why Bradford because there's lots towns in the UK, or you could do the whole of the UK? Why Bradford in particular?

John - So Bradford is a big city. It's the 5th biggest city in the UK. It's very multicultural. It has high levels of ill health and so, it's a good place to study. It's a good population laboratory to try to understand. If you're going to understand the cause of diabetes and heart disease and genetic disease, you're going to choose somewhere which has high levels because you've got more chance of trying to find out why. But it's also a city that has great pride and great cohesion. I think there are not many cities that could've stepped up and what we've done in Born in Bradford.

Kat - How were you actually studying these children and their parents? What sort of things are you measuring and how long are you going to follow them for?

John - We started when the mothers were about 24, 26 weeks pregnant. We've got details, questionnaire data, DNA, biomarkers, body measurements of all the mothers. When the children were born, we measured all the babies and took blood samples from the babies, and then we followed them up over time, groups of them usually, around specific projects. And most importantly, in this 21st century when we have electronic records for everything, we've got consent from all the families to link up their data, their medical data, their education data. And by doing that, we can track remotely at very low cost what's happening to these children in their health and in their education. How long this goes on for? I think the most exciting results in Born in Bradford will be long after I'm dead. But I'm hoping we're going to get some interesting findings along the way.

Kat - What kinds of things can these cohort studies tell us? Why are they so valuable?

John - One of the challenges we faced in trying to unpick the causes of disease is firstly having a selection bias in that we just choose people that have the disease or don't have the disease for particular reasons. So by choosing a cohort, you don't know who's going to get the disease. So, it's unbiased in its sampling. The other key aspect of it is that it's prospective. So, you're collecting data before something happens so you're not relying on the recall bias of the mothers or the fathers of the children to say, "Maybe it was that meal I had that gave me food poisoning" which we all tend to do. We all tend to attribute things and think to our own lay beliefs. So by collecting data, clear data and backing that up with good genetic data and biomedical data, we can be pretty accurate in what we're finding out.

Kat - It's also a big commitment for a family to be part of this if you say, "We're going to follow you as long as we can". How do the families feel about being involved in this, particularly as in some cases, you are really digging into their genetics and their health?

John - What has amazed us in Born in Bradford is how engaged the community has been, how altruistic people are, actually. When there's a good cause in terms of providing solutions to their children's children about improving health, how people want to participate. So, we haven't had any shortage of goodwill.

Kat - John Wright, director of Born In Bradford. 

05:45 - Eamonn Sheridan - Finding disease genes

One man who’s been digging into the Born in Bradford data is Eamonn Sheridan from Leeds University.

Eamonn Sheridan - Finding disease genes
with Eamonn Sheridan, University of Leeds

Kat - So what kinds of disease-causing genetic changes have turned up in the Born in Bradford study so far? One man who's been digging into the data is Eamonn Sheridan from Leeds University.

Eamonn - Well, from a straightforward academic perspective, we've identified 30 or 40 novel disease genes in that period of time at least. And several of those are tremendously interesting biologically. Usually, we only identify disease genes that cause disease in a couple of families locally. But the reason that it's important is because of what it tells us about basic biology. So, we've identified disease genes that are involved very basic mechanisms about the way that your brain grows and the way that your brain is formed, which is an extremely hot topic in biology generally. And then we've also identified genes that are involved in the way that the little energy factories in the cells, the mitochondria, the way that those work - fairly fundamental things about those. So, although we tend to identify disease genes which are of importance clinically to any restricted number of families, they're biologically really important because they give you shortcuts into understanding the biology which would otherwise be really difficult to obtain.

Kat - The Born in Bradford study is carrying on for a long time into the future. Do you think that the gene variations you've identified now are pretty much the low-hanging fruits? Do you hope that there's going to be many more coming out?

Eamonn - These are definitely the low-hanging fruit at the moment because these are the genes where faults in a single gene cause a single disorder, and that's relatively straightforward. But the nature of the Born in Bradford cohort particularly this kind of bi-ethnic mix of white British people and Pakistani people means that we can investigate other problems as well. One of the areas that we're particularly interested in is diabetes and we know that the frequency of diabetes in the Pakistani community is greater than the white British community. The comparison of genetic variation in those two communities ought to give us an idea of why it's more common in the Pakistani community than the white British community. Well, that's a longer term and much more complicated programme than we've pursued up to now.

Kat - More broadly speaking, how valuable to genetic researchers like yourself are these large cohort studies?

Eamonn - From the perspective of sorting out common diseases, there's absolutely no doubt that these large cohorts are the only way forward because in essence, with classical genetics where a fault in a single gene results in a single disorder, the effect of the variant is extremely high. It results in a distinct disorder. Whereas with common disorders, the effect of individual variation is going to be actually very small. And it's only by having large cohorts with big, big numbers, you can actually identify things that are likely to be significant.

Kat - Leeds University's Eamonn Sheridan. 

08:34 - David Van Heel - Human Knockouts

David Van Heel has been studying the Born in Bradford cohort to search for so-called ‘human knockouts’.

David Van Heel - Human Knockouts
with David Van Heel, Barts and the London

Kat - Although Eamonn and his team have found many genetic changes linked to disease, it's not always that straightforward. A Bradford lad himself, David Van Heel is now Professor of Genetics at Barts and the London's Blizard Institute. He's been studying the Born in Bradford cohort to search for so-called 'human knockouts'. More commonly associated with laboratory genetic engineering techniques involving mice or fruit flies, knockouts are organisms lacking both copies of a particular gene - the one from mum and the one from dad. And rather than being created in a lab, human knockouts are naturally-occurring, as he explained to me.

David - Everybody has some knockout genes. Normally, we have about a hundred or so genes where one copy doesn't work and we thought that in the Bradford population, we would find people where two copies of the gene didn't work for reasons I'll tell you, and that that will be very interesting. Indeed, it's turn out so there are actually healthy people who are absolutely fine with genes that don't work and that actually tells us a lot about human biology.

Kat - Why is this population particularly interesting to try and find examples of these people who've got two faulty copies of particular genes?

David - So, I don't like the word 'faulty' first of all. I like the word 'different'. We all have lots of gene variation. Faulty implies that something might go wrong where it's actually what we've been seeing is that we have been finding genes that are knocked out. So, you don't have the protein but your health is unaffected. So, why it's interesting in some south Asian populations, there's a higher rate of close related ancestors. So for example, people of Pakistani heritage in Bradford there's about 20 per cent, 30 per cent rate of people marrying their first cousins and children who are offspring of such a marriage will inherit two copies in some cases of a gene from the same ancestor.

Going back to the knockout study, we have done something called exome sequencing of 3,000 people from the Born in Bradford study. That's where we've looked at all the protein coding genes in the genome in all those people and we've said actually, can we predict any of those genes might not work, where both copies don't work. We've been finding that in those 3,000 people, there are about a thousand people with double gene knockouts. We've looked at their health records - they're mothers in the Born in Bradford cohort - and these people don't appear to suffer any ill health. In terms of medicines or how often you see a doctor, but we do find some very interesting genes which are switched off.So for example, there are a couple of genes involved in hearing where children being described with variants in those genes which cause genetic hearing loss. But we've found adults with variants in those genes where the genes don't work whose hearing is perfectly fine. Indeed, some of them have had audiometry which is a proper hearing test and that's completely normal. So, what that's saying is that actually, for some genetic variants which are thought to cause a genetic disease, actually, when you look in healthy populations, you can find those genetic variants.

And that suggests that the risk is actually considerably less than might have been thought and has implications for the advance of genome sequencing - there's now a lot of companies offering genome sequencing. You can go out and buy your own genome sequence and there's a lot of ethical discussion about feedback of results. But what we have shown is that even for the genetic changes which might have the most obvious effect on a protein in many cases whilst they might alter biological pathways, they don't affect health.

Kat - It almost sounds contradictory to what many people might have the idea of genetics that a difference or a change in one gene. If you get two versions of it, that causes a disease, the sort of idea of Mendel's one gene-one disease. What your finding suggests that people can be walking around with these differences that should cause them a problem, but don't. So, what's going on? Is something else compensating? What do we know?

David - So, I think you're right. Something else is compensating. The variants we find definitely change the protein but don't give the person the condition that might be expected from studies of big multiply-affected families. Actually, I think it just compensation that there are another 19,000 genes in the genome other than the one we're looking at and the variation in those can compensate. What we've done by studying mothers who came to the antenatal clinic is actually pick healthy people so there's what's known as an ascertainment bias. It's the exact opposite of what people studying rare diseases have done. They've picked people with rare diseases and picked a very severe end of the spectrum. We've picked a very no-disease end of the spectrum and perhaps it's not surprising that we're finding somewhat different and lower risks.

Kat - Where next? How do we start to unpick and understand what's going on and then use that information for health benefits?

David - So, there's a whole variety of things we can do. We're setting up this big study, Genes and Health, taking adults from the population. So, we're not specifically going for healthy, but we're not looking for people with rare and severe diseases. And so, there's quite a lot of other things like drug response. We're looking at people with diabetes, particularly in south Asian populations with very high rates of diabetes and outcomes from that. Although I've been saying that the risks for some of these genetic variants are lower, they're not zero. So for example, in the Bradford population, we have found 40 people who have knockouts in genes which should cause a recessive genetic disease and looking back at their health records, we found about 20 per cent of those people do actually have the condition that the Mendelian databases suggest. But 80 per cent that we think that's an example of this reduced penetrance of genetic conditions.

Kat - David Van Heel, from Barts and the London, and thanks to Laura Lamming, the Born in Bradford team and the National Media Museum for allowing me access to their meeting. 

Pregnant abdomen

15:02 - Meaghan Jones - Asthma and epigenetics

Meaghan Jones is studying a group of Canadian children looking at the genetic, epigenetic and environmental triggers of asthma and allergies

Meaghan Jones - Asthma and epigenetics
with Meaghan Jones, university of British Columbia

Kat -  Now it's time to return to our theme of cohort studies. It's not just Bradford - large-scale studies are ongoing all around the world. And it's not just genetics that they're looking at - there's epigenetics too, a topic we covered in the past couple of podcasts. Meaghan Jones, at the University of British Columbia in Vancouver, Canada, is studying a group of Canadian children, known as the CHILD study - looking at the genetics, epigenetic and environmental triggers of asthma and allergies, from the very earliest stages of life.

Meaghan - Things that happen to you in utero during pregnancy when you're in your mother's womb or early in life lead to health outcomes later on. There are a lot of examples of these. We know about these things already. We know things like children who grew up in adverse environments, if they grew up in a poor neighbourhood, lots of violence and crime in the neighbourhood, they have worse health outcomes, especially things like cardiovascular disease, later in life. So, we know the connections exist, but no one knows why. And so, we're trying to find out if we can find molecular mechanisms that explain some of those connections - how do we see what's happening physically and on a cellular level to connect a prenatal exposure with later health outcome.

Kat - How definite are these links because it sounds kind of scary thinking, oh my goodness, something that your mother did when she was pregnant with you and maybe when she didn't even realise she was pregnant with you could be dooming your forever into a life of ill health?

Meaghan - Yeah, that's a really good point. It is not all doom and gloom in this field at all. It's definitely not deterministic. The kind of things we work on are very plastic. They're very changeable, but they are sometimes indicators and they're indicators of risk. So, like I said, the example about children growing up in poor neighbourhoods and heart disease, it's not that every child that grows up poor is going to grow up to have heart disease but they are at increased risk. So the nice thing is if we can start teasing out some of the mechanisms figuring out why this has happened, we can figure out the kinds of interventions that can prevent the health outcome in the future.

Kat - What kind of diseases are we talking about? You've mentioned heart disease, but what other sort of things might be linked to these kind of early exposures?

Meaghan - So, one of my major studies is on asthma and allergies. Development of the immune system is very sensitive to insults and we know especially in the western world that incidence of asthma and allergies is growing by leaps and bounds. We don't know why. There's the idea of the hygiene hypothesis - in the western world, we're not exposed to the natural environment the way we used to be.

Kat - Not enough filth basically.

Meaghan - Yeah, we're too clean. We don't go hanging around with cows and chickens in the yard, and that makes your immune system start to overreact if it's exposed to things that are not actually dangerous like pollen or cat hair, or any of the other things that there's no reason for your immune system to act like that way.

Kat - So, how do you start looking at these kinds of links between these early exposures and later effects on the immune system?

Meaghan - So, we will take advantage of what's called natural experiments, which is a fancy term for cohort studies - some amazing kind people who are willing to sacrifice time and effort to be parts of our studies. I work on one of this big Canadian study- it's called a Child Study - big cohorts within Canada. All they did was put up posters in waiting rooms of prenatal clinics and said, "Would you like to be part of a study?" 3500 women in Canada signed up and they're still there 3 years down the road. They come in once a year, they're collecting data, they fill the questionnaires. So, these natural experiments take advantage of the variability exists in the population. We just ask everyone. Anyone who's interested can come in and that gives us a really nice cross-section of the kinds of things that actually exist.

Kat - What sort of things are you looking at then to try and work out, might have an influence later on? What sort of things are you measuring?

Meaghan - We're measuring something called DNA methylation. DNA methylation is a mark on your DNA. It doesn't change the sequence of your DNA at all, but it adds a little bit of a chemical group at specific sites in the genome. The only thing that that does is it affects the way that your genes are turned on and off. So, we think because it's not changing the sequence, because it's very changeable, it can be put on and taken off sort of at will, that it's more responsive to the environment. But it may stick around after its usefulness, is the idea. So, if you're exposed to stress, if your mum is having a stressful time during her pregnancy then your immune system may start reacting to stress because biologically, physically, you're reacting together with your mum. But after you're born, you don't necessarily need to keep that same active mark on and it might stick around afterwards.

Kat - Where do you think we are with being able to definitely link some of these epigenetic effects to actual health outcomes in the long term?

Meaghan - Yeah. We're not there yet. Right now, it is very correlational. So, what we do is we take large groups of people, the bigger the better, and we say, "Do we see patterns in these people that are different from another group of people?" So, that's just correlative. We can't say anything about causation yet. The thing that's really exciting about the field is this is all brand new. We don't actually have that much even at this preliminary stage yet. So, we're still building on that and that's why it's really exciting to be working in the field.

But the next stage is - we're starting to get there - which is validation. That's the first thing you need to do. Once you find something in a group of people, you have to find another independent group of people and find the same thing or it may just be an effect of the specific group you looked at. After that, once we start finding these effects, we need to drill it down a little bit more closer into the molecular mechanisms to find out whether the patterns that we're seeing are actually having an effect. It's going to be a long road, but if it was easy, anyone could do it.

Kat - That was Meaghan Jones from the University of British Columbia.

Baby identical twins - they may share the same genes, but which ones are active?

20:51 - Robert Plomin - Tracing twins

Professor Robert Plomin from Kings College London explains why twins are so interesting to geneticists, and what they can tell us.

Robert Plomin - Tracing twins
with Robert Plomin, Kings College London

Kat - In last month's podcast we heard how Professor Robert Plomin and his team at Kings College London have been tracking thousands of twins over fifteen years, recently seeing them through their GCSEs and investigating the genetic components that are linked to academic success. I asked him to explain more about why twins are so interesting to geneticists, and what they can tell us.

Robert - Almost all countries now, something like a hundred countries, have National Twin Registries. The reason for that is the twin method, even in this day of DNA is still very valuable as an initial screen for whether or not traits are influenced by genetics, by heritable differences between people - which fundamentally means DNA differences between them. About 1 per cent of all births are twins, live births, and about a third of those are identical twins are called monozygotic which means a single zygote. A single fertilised egg that in the first couple of weeks of life separates into two clones and they really are clones. They have identical DNA material. In contrast, the other two thirds of twins are called di-zygotic, two zygotes, meaning, two separately fertilised eggs like all first degree relatives, they share 50 per cent of their genes. They're just siblings. So, you can use this then as a natural experiment. If something is heritable that is influenced by genetics, you'd have to predict that these pairs of clones would be more similar because they share 100 per cent of their genes than fraternal twins or di-zygotic twins who share only 50 per cent.

And so, the twin method allows us to look at the extent to which these differences - say in musical ability, which actually hasn't been studied, it's only recently there's been a study done on it - people know it's heritable, the Bach's, the famous Mozart family, that sort of things, but that doesn't prove if that could be nature or nurture. So, the twin method is a good way of screening for genetic influence. And so, when I came to the UK from the US, I was interested in the fact that in the UK, there are national statistics whereas in the US, everything is decentralised to states. So, I wanted to get a national registry of twins because epidemiologically, that just makes a lot more sense to more representative sample of that sort of thing. So, we were able to do that and that created the twins early development study which is a study of all twins born in 1994, 1995 and 1996. So, those twins are now taking A levels and going to university. What we've been publishing on recently is GCSE scores.

Kat - How many twins have you got in this study? How do you track them down and recruit them into this kind of study?

Robert - Well, if 1 per cent of births are twins, you'd expect that back when we were doing it in 1994, 1995, 1996, about 1 per cent of all births is about 7,000, 8,000 pairs of twins born a year. And so, when we studied these three birth year cohorts, we were able to initially identify through both records over 18,000 twin pairs. Amazingly, over 16,000 are the parents of these young twins. This is in the first year of life were interested in being part of the study. Twins are great because parents of twins know their twins are special. You're not studying them because they've got some disease or something like that. They're just normal but their twins and fascinating.

We have a solid bunch of about 7,500 pairs. That's 15,000 individuals who have been participating regularly. For GCSEs, we have that many pairs of twins, and for A levels now. As they go into a university, we've just been funded now. The MRC I should say has funded this all along and we're in our fifth renewal of our programme grant. So, we're now funded to study them through 25 years of age. These cohort studies are very valuable because we carried them through GCSEs but then having 16 years of data on these children from infancy, we've studied them about 12 times over that period. It adds so much value than to add another assessment.

What our pitch now is to say, hardly anyone has studied the transition into what we call emerging adulthood. It's a bit of a buzz word, but this era, it used to be that you went from school to marriage, a job for life, end of story - some people would say. Now, there's this long, long period. It's not like delayed adolescence or something. It's different. There's independence but a sense of trying a lot of things out and on average, it goes on for 8 years, 10 years. And so, it's really a great chance to take these 20 years of data that we have and then use this twin method to study what we're calling sort of functional adjustment to adulthood. We're not going to just study academic skills anymore, but more like the communication skills, the adjustment you need to get through this really wild emerging period of adulthood that we now have.

Kat - In terms of their genetics and their DNA, what are you analysing in their DNA? Are you doing all their genomes? What are you looking at at the DNA level?

Robert - Well we, like everyone else have collected DNA from about 12,000 of these individuals. And so, like everyone else, we're also trying to find genes using the same methods that people use. In general, people realise you need even larger samples. So, there's a big tendency towards collaboration and consortia where you put the data together to get hundreds of thousands of individuals to detect the tiny effects. You need very big samples. So, we're doing that, but what I'm particularly interested in is this tendency towards what we call polygenic scores. So, you don't just take the one or two bits of DNA that look like they're associated with the trait. Say like mathematical ability or achievement in STEM subjects.

Kat - You find like this DNA variation seems to be more common in people who are really good at maths.

Robert - Yeah, that's what an association is. But instead of looking for the 1, 2, 10 or 100, what we're doing is taking tens of thousands of these single nucleotide polymorphisms called SNPs, just DNA differences that are in two forms. That seems to be paying off in a lot of areas of complex traits in medicine as well as in the behavioural sciences. So, once you can do that, you can begin to predict even if you're only explaining a few per cent of the variation in these complex traits. And so, that we can do with our sample sizes and so, that's where we're kind of aiming now.

Kat - So, it's not about pinning down a specific gene or a specific kind of region of the DNA. It's more about the genetic landscape I guess, kind of the whole picture, the tone of the whole picture in someone's genome.

Robert - Yes. It's not every single bit of DNA. I mean, for some traits, you'll find that some 10,000 SNPs are making more of a difference than for other traits. So, it's not the same genes that affect everything. But in the cognitive realm, what's interesting is how general the effects are. I think that's beginning to get the attention of neuroscientists because for a long time, neuroscience was kind of modular. They were looking for which bit of the brain does this and which bit of the brain does that. Other people are saying, "Why are you doing that?" because surely, the brain evolved to be a general problem solver. Instead of making it easy for neuroscientists by finding single tracks between genes, brain, and behaviour, it makes much more sense to take advantage of the little differences that are there in a lot of different systems. So, I think systems approaches, network approaches is what it's about. In genetics, we see that too. People are doing network sort of analyses, whole gene analyses rather than looking SNP by SNP (single nucleotide polymorphism) at a time.

Kat - Robert Plomin from Kings College London. 


28:41 - Gene of the Month - Kenny

And finally, it’s time for our gene of the month, and this time - Oh my God, they killed Kenny!

Gene of the Month - Kenny
with Kat Arney

And finally, it's time for our gene of the month, and this time - Oh my God, they killed Kenny! Named by University of Strasbourg scientist and South Park fan Sophie Rutschmann, fruit flies with a faulty version of the Kenny gene die within two days after being infected with certain bacteria, similar to the hapless character in the cartoon show. Also known by the names NEMO and IKK gamma, Kenny is an important part of the fly's immune response, working together with another gene known as Relish. 


Add a comment