A hundred thousand genomes

We look at the UK government's plan to sequence the genomes of 100,000 people
14 March 2015
Presented by Kat Arney


Genomics England


Over the past year the Government has unveiled an audacious programme under the banner of Genomics England, aiming to sequence the genomes of 100,000 people affected by cancer and rare genetic diseases. We take a look at some of the practical and ethical issues around the project. Plus, our gene of the month comes from the land of the forever young.

In this episode

01:06 - Mark Caulfield - Genomics England

Mark Caulfield, chief scientist for Genomics England, explains the idea behind the 100,000 Genomes Project.

Mark Caulfield - Genomics England
with Mark Caulfield, Genomics England

Kat - As the cost of DNA reading technology has come down, it's now possible to think about whole genome sequencing on a grand scale, in order to help us understand more about how our genes affect our health. With this in mind, the UK government, in partnership with the NHS, has set up Genomics England - a company aiming to read the genomes of 100,000 people affected by cancer or rare genetic diseases.

I went along to a recent event about this 100,000 Genomes Project, run by the Progress Educational Trust, which aims to promote discussion around research into and use of genetic technologies. One of the panellists was Mark Caulfield, chief scientist for Genomics England. I asked him to explain what the project is all about.

Mark - The 100,000 Genomes Project is sequencing the entire genetic code of people with rare inherited disease, cancer and infections. The goal is to use the entirety of the genetic code to understand the causes of those disorders, and possibly create the opportunities to develop new medicines and new diagnostics in the NHS. And that means that we can combine talents of the whole National Health Service to deliver this programme, working with key partners in the NHS but also, some of the charities like Cancer Research UK who face the diseases we're working on.

Kat - Can you explain to me a bit about the scale of the project? How many people with each type of disease are you recruiting and how is it going to work?

Mark - So, we have split the genome sequencing into two segments - half for rare disease and half towards cancer. In the rare disease, we're including people with severe response to infection because it could be a form of rare disease. What we're actually doing is receiving nominations from the NHS, particularly in the area of rare disease, for specific diseases with diagnostic unmet need. In cancer, to make a difference to the knowledge base, sometimes we will need to sequence quite a large number of people with a cancer to take account to the fact that in the cancer world some tumours don't just have one group of drivers to cause the tumour. They may have multiple different drivers and that means that there is heterogeneity or differences within the cancer population. And so, we need to study a lot of people with cancer to unravel that. And so, we're working with Cancer Research UK and other experts in cancer worldwide such the International Cancer Genomes Consortium so that we will combine our data with international efforts to build a global picture of the architecture of cancer.

Kat - Tell me a bit about the kind of analysis that people will be doing. Is this the entire genome, the whole thing?

Mark - Yes. It is as much as we can read of the 3.3 billion letters that make you who you are, that are blueprint for the colour of your eyes, whether your hair is curly or whether how tall you are. But also, this may contain variations and these variations may create susceptibility to disease. The variation we're focused on will allow us hopefully to develop new insights that will help people with treatment with cancer and to get new diagnoses and, possibly, treatments for rare disease.

Kat - Some of the things that people are concerned about with doing this kind of analysis is that not only might you find some of the key genetic drivers of these diseases but you might find other things as well. How are you coping with that?

Mark - What we've done is developed, with the advice of clinicians and with the full involvement of patients, a limited list of items that will feedback that are in essence findings that we're going to look for. So, these are things with severe consequences that we'll actually go and look for in the genome. So, to give you an example of some of those, if we discover a mutation that you've inherited in your DNA or developed yourself, we will feed back if it could cause a severe cancer or if it could cause familial hypercholesterolemia or high cholesterol. The reason for that is because some of those mutations may cause cancer in very early life and if we knew about them, we wouldn't hesitate to continue to screen the person for that emergence of the cancer and then try and do something about it.

Kat - But if it's a mutation that you can't do anything about, it's probably better than not to know.

Mark - That's not necessarily true, because if you know about these things it may alter the way you live and how you interact with other people and it may alter your health behaviours. So, there are some examples where people have been given findings back about Alzheimer's and even though that can't be changed, they welcome the opportunity to know that information and to be able to make lifestyle modifications, some of which may or may not impact the disease. But actually, by and large, people are very receptive to this information. But in the 100,000 Genomes Project, nobody has to receive anything other than the diagnosis of their disease for which they enrolled. So, they don't have to have any of these other findings given back to them.

Kat - One of the things that you're doing with the rare diseases is looking at parents and their children. There is an issue sometimes that fathers are not necessarily the father of their child. Is there a risk that this kind of information could be uncovered and what would you do about that?

Mark - Yes. It's in your genome and we will know that, but it's not a medical condition. That's not something that we'll be feeding back. The reason is because if the family unit believe they're a family unit, who are we to disturb them? It's not going to affect them medically. And so, it's a possibility which we most likely will discover for some people, but it's not a medical condition.

Kat - Some of the other things people have raised are about, "Whose data is this? Whose genome is this? And who could have access to this kind of very personal data?"

Mark - When people make informed consent to participate in the programme, they donate their DNA. In essence, your genome remains your genome. If you want it back, we will give it back to you either as individual genetic variations on a USB stick or you can pay us some money and we'll give you the whole genome back. But that has to come on a hard drive and we don't have the money for that. So, it's like having your computer given to you because it's 220 gigabytes for genome roughly. In cancer, it's actually bigger because we have to sequence cancer more times to understand it.

Who owns the data from the programme? To make this an open collaborative environment and have open innovation, we've said that Genomics England will own the results of the sequencing and the combination with the clinical data. This is so the researchers, the NHS and indeed, people from industry could work together in open innovation space and then come to us and say, "We think we've got a really good idea to develop a new medicine. We'd like to license it from you." What that does is it changes the atmosphere inside the research collaborative environment to encourage people to actually work together much more closely. And that, we hope will also draw in opportunities for patients because access to that data by industry will encourage them to bring their medicines to the United Kingdom because we'll be able to do stratified healthcare on them.

Kat - One of the things that's increasingly happening is that people are having to grapple with their genomes, with understanding risk, with the understanding that finding a particular genetic variation doesn't automatically mean you'll get a disease. It's about risk and what that means. How are you trying to educate the public about what some of the findings in their genomes actually do mean?

Mark - Well, this is a really important point because we need to grow public understanding of what this means and also, in some measure, demystify the genome. So, we mustn't make it a black box. Patients are fed up with black boxes in medicine and what they really want is for us to open up their genetic code and healthcare. And so, that's what we're going to do. So, working with patients who enrolled in our groups and also, through CRUK, we're working with a number of patient groups to inform the programme, develop the programme. And through our public engagement team, we'll have a number of public engagement events. We are doing similar events around the country, already interacting with the public and patients. Public trust and patient engagement and trust is paramount to the success of the programme and we prize it very highly.

Kat - Looking say, maybe some 5 years into the future, what would you like to see and mark as success for this project?

Mark - A really clear marker of success would be at least new diagnoses given to people with rare disease, affected by rare disease. The possibility of having new stratified medicine - that may take longer than 5 years to develop for cancer. But understanding the architecture of cancer would be hugely useful to trying to prime innovation in that area. And may also allow us to use existing medicines better or get some medicines that are stuck on shelves because we don't know what to do with them and they didn't work in their original diagnostic area off the shelf and given to patients.

Kat - That was Mark Caulfield, chief scientist for
Genomics England. 

A baby explores a model of DNA

10:18 - Sarah Wynn - Sequencing for rare disease

Sarah Wynn from Unique explains how families affected by rare genetic diseases hope to benefit from DNA sequencing.

Sarah Wynn - Sequencing for rare disease
with Sarah Wynne, Unique

Kat - One key group of people who will be offered genome sequencing as part of the 100,000 genomes project is children and others affected by rare genetic disorders, who will be invited to have their DNA read, as well as their parents.

In the hubbub after the Progress Educational Trust event, I caught up with Sarah Wynn, information officer at Unique - an organisation that supports families affected by rare genetic disorders - to find out what families were hoping to get out of being part of the project.

Sarah - What most of our families are looking for is a diagnosis and of course, sequencing gives you a much better chance of getting a diagnosis. So, I think that's the real benefit that we're going to see for our families. I think additionally, we are dealing with rare disorders. With rare disorders, what you want is a big dataset in order to get more people. When things are rare, you need a big dataset to find other people who've got the same rare disorder. So, doing this on a big scale and pooling the data and looking at the data is really important in order to learn more about these individual rare disorders.

Kat - What do we actually class as a rare genetic disorder? What sort of things are we talking about here?

Sarah - Well, at Unique, we support almost all rare chromosome disorders, so not Down Syndrome because that's not so rare and also, it's adequately supported elsewhere. What we support historically, those who aren't supported elsewhere. So, we sort of have mopped up all of the...

Kat - ...spare diseases!

Sarah - ...all the other ones. And of course, loads of our members are completely unique. So, there is no one else who has the same diagnosis as them. So, we're not just rare but we go straight all the way down to unique. Of course, that's going to happen more and more with genome sequence.

Kat - I've heard before when it comes to genetics, we're all a little bit mutant because everyone is unique in their own way.

Sarah - Exactly.

Kat - Where do you draw the line as to what classes as a disorder and what sort of things are you looking for in someone's genome? How do you know that that's the gene, and what can this sort of project tell us?

Sarah - Well, I think that's a good question. Of course actually, one of the issues is, is that having a change in a particular gene or a particular chromosome doesn't necessarily mean that you have the disorder and some things are risk factors or they're involved in penetrance. We have seen at Unique that we have families that have a small piece of chromosome missing, a tiny piece missing and it's within their family. There are multiple members of the family that have this missing and yet, only some of them are affected, some of them are unaffected. So of course, your other genes and your environment play an important role in that. Genetics is a bit more nuanced than just a yes or a no and it's really important that we acknowledge that and try to educate people a bit more about genetics so that that's really understood.

Kat - Because obviously, it's not a straight line from genetics to how you turn out.

Sarah - Exactly. It's very complicated. Of course, although we're all similar to each other, we are all different from each other and everybody has lots of tiny changes in their genome. And so, I think these projects are really trying to find out what changes cause problems, or what changes are more likely to cause problems than other changes.

Kat - Some people raised a whole range of issues when we're dealing with this kind of technology from privacy to these kind of things you might find that you might not want to find. If you're looking for a rare disorder, you might find that you have an increased risk of say, Alzheimer's or something like that. Broadly within the rare disease community, do you think that these risks and these issues are worth it for the benefits that you could gain?

Sarah - Well, I think it's really important with these things that you are able to opt in or out. So, people can make the choice about whether they receive those findings or not. Of course, these things aren't really new. I mean, they might pop up more but we have a clinical setting in which we're used to dealing with these sort of unexpected findings that don't relate to the initial problem that you were looking for a diagnosis for. So, I think this is a debate that will continue to happen. I think it's important that families have a say and so, this comes down to consent about whether they received them or not. I think different people are going to make different decisions. And so, it's sort of an ongoing debate I think.

Kat - Given the revolution that we've had in gene sequencing technology just in the past few years and what projects like Genomics England are promising for the future, how would you like to see things changing for families affected by these rare diseases over the coming years?

Sarah - Well, I think for us, the diagnosis is really only the start of it. Actually, what we want is once you have a diagnosis, for you to know what that means in terms of the future of your child but also in terms of care management and therapy and things like that. So, we would like to see along with it the genetics looking at sort of care management and therapies and where that's going to take you.

Kat - That was Sarah Wynn from

Close up of peanuts

15:23 - Gene variations go nuts

Researchers have found a region of the human genome associated with peanut allergy.

Gene variations go nuts

A team of US-based researchers has discovered a region in the human genome that seems to be associated with peanut allergy - the most common food allergy among children, which can be fatal. Writing in the journal Nature Communications, the team analysed DNA from more than 2,500 children and parents enrolled in a large food allergy study. Scanning around a million genetic markers across the human genome, the team found that variations in a region of the genome harbouring certain genes known as HLA genes, short for human leukocyte antigen, were linked to the risk of developing food allergies, including peanut allergy.

In total, the variations they found accounted for about 20 percent of peanut allergy in the people in the study, but not everyone with the variations had the allergy. The researchers think that other changes in the genome known as epigenetic changes - chemical tags added on to DNA which don't affect the underlying DNA code itself - might explain these differences in susceptibility. Intriguingly, these epigenetic marks can be altered by the environment, including the diet, but more research is needed to figure out how all these variations and changes fit together to increase allergy risk.

Food allergies are rising rapidly in the US and Europe, and scientists are searching for answers as to why this might be the case. By understanding the interplay between inherited genetic variations and the environment, it might be possible to predict the risk of severe food allergies in future, and even develop drugs or diet or lifestyle interventions that help to control or even prevent them.

16:36 - DNA spellchecker affects mutation rate

Scientists have discovered that our cells' 'spellchecker' doesn't work equally well across the whole genome, creating mutation hotspots.

DNA spellchecker affects mutation rate

Researchers from the Centre for Genomic Regulation in Barcelona, Spain, have discovered that the molecular toolkits that repair our DNA don't seem to work equally well across the whole genome. Looking at 17 million gene faults across the DNA of 650 cancer patients and publishing their findings in the journal Nature, the scientists discovered that certain parts of the genome where genes are actively being read tend to have more accurate 'spellchecking' than the parts where genes are turned off, through a process called mismatch repair. This means that even though mutations are occurring at more or less the same rate throughout the whole genome,  mistakes in genes in these lively areas are more likely to be corrected than those elsewhere. Importantly, many of the uncorrected regions seem to contain genes that are implicated in cancer when faulty.

Intriguingly, the team also found differences in patterns of faulty genes in different types of cancer, reflecting underlying differences in patterns of gene activity. And once the DNA spellchecking machinery was turned off in cells, they started picking up mistakes across the whole genome, not just in under-used areas. Faulty mismatch repair is found in several types of cancer, including bowel, stomach and womb cancers, so these findings give new insights into how these tumours might start, or even how they might be treated more effectively in future.

Hessian fly, Mayetiola destructor, barley midge. A significant pest of cereal crops including wheat, barley and rye. Though a native of Asia it was transported into Europe and later into North America in the straw bedding of Hessian troops.

17:53 - Shoo fly, don't bother wheat

Scientists have sequenced the genome of the Hessian Fly, a major wheat pest, in order to try and reduce its impact on crops.

Shoo fly, don't bother wheat

An international collaboration of researchers has sequenced the genome of the Hessian fly, whose larvae feed on wheat plants and are a major agricultural pest around the world. The grubs inject their saliva into wheat seedling stems, hijacking the plants and creating galls - clusters of abnormal tissue - that provide food for the larvae but stunt seedling growth.

Publishing in the journal Current Biology, the analysis shows that the fly's genome contains a large number of rapidly evolving genes that encode proteins that act as control switches inside cells, turning genes on and off. And these proteins are remarkably similar to wheat plant proteins, suggesting that they mimic normal wheat proteins in order to trick the plants into making galls. But at the same time, there seem to be genes in wheat that can also evolve quickly to counter this attack. In fact, around a third of the Hessian fly's genes don't seem to have a clear counterpart in other insect genomes, suggesting they are evolving fast as a result of this genetic arms race.

The scientists hope that the unveiling of the Hessian fly genome will lead to better ways of making wheat resistant to the pests, and providing farmers and plant breeders with more information about the best varieties of wheat to grow. Knowing more about the genes and proteins within the fly larvae might also point towards more effective, highly targeted pesticides or other control techniques in the future.

Thinking girl

19:32 - Anna Middleton - Genetics and ethics

Genetic counsellor and social scientists Anna Middleton explains some of the ethical issues around genome sequencing.

Anna Middleton - Genetics and ethics
with Anna Middleton, Wellcome Trust Sanger Institute

Kat - You're listening to the Naked genetics podcast with me, Dr Kat Arney. Still to come, our gene of the month is forever young. But now it's time to return to our main topic of mass genome sequencing, focusing on the 100,000 Genomes Project. Although many people - particularly in the scientific and rare diseases communities - are keen to see this kind of research happen, there are significant concerns around ethics, data, privacy and more. Genetic counsellor and social scientist Anna Middleton, based at the Wellcome Trust Sanger Institute outside Cambridge, is investigating how to bring forward more informed discussions about the ethics of genetics. I started by asking her to explain what some of the key issues are.

Anna - The key ethical issues really relate to expectations, managing expectations properly and an understanding of what can be delivered and what can't be delivered. With the 100,000 Genomes Project is the opportunity for three groups of patients - so cancer, rare diseases, and infectious diseases - to access sequencing technology. By doing that, they may gain answers to a diagnosis, to a range of treatments that are possible for them, or they may gain absolutely nothing. The promise of the delivery of answers is really bandied around but actually, may not offer nothing useful at all. I think patients seems to go in and engage with it with their eyes open. As in any genetic test, if you get results related to anything, the results are often relevant to not only yourself but also to your immediately family as well - so your siblings, your parents, your children. So, patients may go for a test for one thing and come out with answers relevant to their whole family. And so, that raises ethical dilemmas for different people and so it sort of really brings to the heart of this that consent is the most important thing, and that people really understand what it is that they're signing up for, they understand what the options are in terms of results, and they go into it with their eyes open.

Kat - As I see it, one of the challenges with this kind of technology is we're moving towards being able to do whole genome analysis, we're moving towards an era of the thousand dollar genome where you can have everything tested for regardless of whether it's relevant or not. Is it that the technology has raced ahead of our ability as a society to understand this and to cope with it and to think about it?

Anna - Well, the technology has raced ahead and just because it's easy and relatively cheap to look at 20,000 genes, that does raise the question of, well, should we? But in terms of healthcare, policy's really been shaped around answering those specific clinical questions. So, in a healthcare setting, really, you're not going to get an analysis of 20,000 genes in one go and have your whole genome delivered to you on a plate. Really, what they're going to be doing is just fine-tuning it to answer those specific clinical questions. So, I think it's a little bit misleading to be thinking that there's going to be an absolute deluge of data that people can't cope with. That's probably not what we're dealing with say, in the NHS.

Kat - Obviously, there are scientific research going on into sequencing, but there's also research going on into the ethics and the public understanding of this. Tell me a bit about the work that you're doing to try and understand what people think, what people understand, and to shape the policy in how things do go forward?

Anna - Yes. There's lots of assumptions about how people might want to use a technology and what they'd want from it, but very little empirical data that actually asked people what they want from it. And so, I designed the GenomEthics study which is a very large scale survey to try and get people to engage with this. The way we did that was to create ten short films that sit in the survey and they describe the ethical issues raised by genomics. The films really ended up being a great hook to get people interested in the topic. And they helped the survey to go viral and we had 7,000 responses from 75 different countries. And that's given us a really large data set to try and understand what people want from this and what they think about it.

The overwhelming response is that people are really excited by this. So, they like the idea of knowing what's in their genes. They feel connected to it, they're inspired, they're interested. So, that was a really nice finding. Then we asked people, "What would you actually want to know? Would you like to know about genes linked to serious life-threatening conditions that can be prevented? Would you like to know about serious life-threatening conditions that can't be prevented? Would you like to know about information relevant to your children? Would you like to know about information relevant for when you're older later in life? The things that aren't relevant now, but maybe in the future." As we went through different categories of information, as the categories became less serious, or less treatable, people were less interested.

So, mostly across the board, irrespective as well, they're most in the world were interested in data relating to serious life-threatening conditions where some action could be taken to protect the person against the condition. They were least interested in uncertain data or receiving raw sequence data. But actually, what was so fascinating was that even for uncertain data and even for very low risk data relating to health conditions that have a very low chance to actually happening. People were still saying, "Yes. If you know it, I'd like to know it too if this is interesting and useful for me."

Kat - How would you like to see things moving forward in this area - the 100,000 Genomes Project,  Genomics England is launching, they're starting to gather people. There's large scale sequencing projects going on all over the world. How would you like to see things moving forward in our public understanding and also, in terms of some of these ethical issues?

Anna - To me, I feel it's incredibly important to have a very robust and sensitively delivered public engagement exercise. It's a real opportunity missed if people don't understand what this technology can offer and are possibly fearful of it or just confused by it. We really need to explain what it can offer, explain what it can do and help people to have a populist conversation about it. One of the next projects that I'm working on is looking at really how to turn genomics from something that's currently quite anti-social into something that's quite social. I mean, how do you start a conversation about it with people who have no clue what it means? And so, I'm looking at working with people from the advertising industry to try and get really simple messages about what this can actually offer out into the public and to get people talking about it. And then once people are talking about it then they can choose to engage with this or not. But at the moment, we're at this situation where the science is moving so fast and it's going to be implemented in clinics so quickly, but the public aren't really there with it. We need to start having national and international conversations about what it could actually do.

Kat - I'm a science journalist, I'm a science broadcaster. I've spent a long time trying to get across messages about how genes work, what's in our genes, and what they can do for us. Obviously, people who are listening to the Naked Genetics have some kind of interest in genetics. Do you have any tips maybe for me or for my listeners about what we can do, how we can get engaged in those conversations, or where we can go for more information?

Anna - How to actually start a conversation about this is incredibly difficult. We don't yet know really what the hooks are to start that conversation. You know, is it something about, we're all related to each other? Is it something about our identity? Is it even something as simple as - well Angelina Jolie has had a test for something genetic and starting the conversation that way? I mean, how to actually bring this into conversation will be really, really difficult and that's something I'm currently trying to research. We need to understand from a social sciences perspective what actually touches people, how it's meaningful for them, and how it connects them. I mean, my mission is to do more work with film and to try and discover the metaphors that people like to use to explain genetics and genomics. If I can create a series of films that have no spinning double helixes in there, then I'll feel I've made a contribution.

Kat - Anna Middleton from the Sanger Institute, and if you'd like to have a go at her Genomethics survey yourself, it's online at

Tir nan Og

28:24 - Gene of the Month - Nanog

Our gene of the month is Nanog - named after the mythological Celtic land of the forever young.

Gene of the Month - Nanog
with Kat Arney, Naked scientists

And finally, it's our Gene of the Month, and this time it's Nanog, whose name is taken from Tir nan Og, the mythological Celtic name for "land of the forever-young". First discovered in 2003 in mouse embryonic stem cells - the immortal cells in the early embryo that can turn into any part of the body - the Nanog gene makes a protein that acts as a transcription factor, which switches on other genes. It was originally thought to enable stem cells to keep multiplying while maintaining this multitude of possible fates.

Now the picture is a bit more complicated, and it looks like Nanog plays a subtly different role by actively preventing the cells from heading down the road towards adopting specific fates. It also plays a key role in creating germ cells - the special cells that become eggs and sperm.


Add a comment