Big data, big release

14 July 2017

Interview with

Peter Donnelly, Wellcome Trust Centre for Human Genetics, University of Oxford

This month saw the release of the biggest ever dataset from the UK Biobank - a huge long-term study of human genetics and health. Researchers all around the world are able to apply to download the data and trawl through it to answer vital questions about health and disease. Kat Arney spoke to Professor Peter Donnelly, the principal investigator of UK Biobank, based at the Wellcome Trust Centre for Human Genetics in Oxford, to discover what the fuss is all about.

Peter - The idea behind UK Biobank was to collect a very large cohort of individuals in what's called a prospective study. So those individuals are recruited, they give consent for a range of studies to happen at the time they're initially studied, and then over time, there’s further information collected either directly from individuals or with their agreement to link with medical records.

So it gives you a resource which is valuable and interesting from day one, but which in some sense matures. As time passes, some of those individuals go on to develop some diseases and others would develop other diseases.

The very important thing about the prospective study is you measure lots about the individuals before any of that happens. So you can look back and see if individuals with these properties are likely to happen, and so on. And the fact that it’s as large as it is 500,000 individuals, it is a real key to its value and its added success.

Kat - In terms of looking at this kind of genetic level, what have you done?

Peter - A few years ago, UK Biobank convened a group of many of the experts in human genetics in the UK to think about how to get best value out of the UK Biobank results in terms of genetics. The very strong view of that group at the time was that the right way to do it then was to measure genetic information in all of the participants of the UK Biobank. If you are able to decide in advance a set of positions in our genome in our DNA to measure then that can be done these days reasonably economically.

And so, there was an expert group convened to design the chip, the array that does the measuring for each individual, and for each person, that allows measurement of about 800,000 positions in their DNA, in their genome. And we’re able to choose them so that we’d include lots of things we thought might be interesting for diseases, other things we thought might be interesting for other reasons, and then many, many other markers or genetic variants or snips – as we call them – across the genome which allow us to interrogate the whole genome recently effectively.

Kat - So we’ve got this 500,000 people, we’ve got all these genetic information – you’ve genotyped them – what other information have you got about them?

Peter - One of the enormous strengths of UK Biobank as a resource is the depth and the breadth of information that’s measured on the 500,000 participants. So, when each of those individuals was initially recruited to the study, they spent half a day or so at what's called an assessment centre where a lot of information was collected about them. They were asked questions about themselves, questions about their medical history. Many things were measured about them – height and weight, things about their vision and their lung capacity.

So, there's a lot of measurements about them as individuals. They get blood samples. So some of that had been used for the DNA analysis to have a genotyping, but other aspects of those, blood samples are used to measure – what we call biomarkers, so things like cholesterol levels, other things that are informative about the health circulating in our blood. Since the initial assessment, there had been follow up studies on collections of subsets of the cohort to get more detailed information from them. A subset of people give information about their diet over a period of time which is available on those individuals.

There's been some re-assessment of individuals. In a really exciting development recently, there's imaging, so different sorts of medical imaging done on people’s brains and their abdomen, and their arteries, and so forth which again, is a rich source of information. Recently, UK Biobank has succeeded in linking some of the health information kept in our NHS records with the individuals. So there's an enormously rich collection of information of many things like height and weight, and then various medical things, and really detailed things like images, measurements and biomarkers on all these individuals.

Kat - This is something that you are putting out into the world and you want people to use. What sort of things could researchers look for in there? How do you see people using this data? What sort of questions could people ask with it?

Peter - I think there are a number of very natural things that people want to study but one of the really exciting scientific opportunities is the resource is so vast in terms of their genetic data, and all the other things that are measured on people that I'm sure there’ll be really clever scientists who think why it’s worth exploring that data to tell us things about human biology and human disease that we wouldn’t guess now. So that’s really exciting.

The obvious things people can do, they can look at the relationship between genetic information. We have various outcomes that people have – now those outcomes might be in terms of whether they get diseases or not, arthritis or heart disease. That could be really helpful in understanding more about the disease and in time developing new treatments for the disease.

Another really important thing which UK Biobank will enable researchers to do is to understand the way risk factors we have from genetics interact with other things around us like our diet, our lifestyle, other aspects of our health. In fact, it’s one of the key drivers in setting up the UK Biobank resource was to have a study which was firstly, large enough and secondly, collected the right kind of information to allow researchers to do exactly that. So up until now, largely, we’ve been able to look at how whether you have this genetic variant or this one affects how likely you are to develop the disease.

But what UK Biobank allows is to ask questions like, if you have this variant rather than that variant and this diet rather than that diet, does it matter? So, helping us to learn about the ways in which the stuff we inherit – our DNA – interacts with things around our day to day life, both the choices we make and the things that we interact with in the environment. The ability to study what’s called gene environment interactions will be a really powerful use for UK Biobank.

Kat - So sort of opening up the black box between our DNA and the way we come out?

Peter - Absolutely, yes.

Kat - In terms of this as a scientific achievement, what does this represent?

Peter - I think there's no doubt that as a scientific research resource, UK Biobank as it stands now is an extraordinary collection of information for researchers. It’s by a long way – I think – the most valuable resource of its kind available anywhere in the world. One of its strengths is that it is available to researchers around the world, researchers who are working to answer questions that are consistent with the mission and the framework of UK Biobank and it will only improve.

It will improve in two ways. There will be more information collected about the individuals in the study and as time passes, there’ll be more outcomes in terms of the diseases individuals suffer from or other things that happen to them which will be helpful for researchers to study.

Kat - How would you summarise your hopes for this dataset and maybe for UK Biobank more broadly, say, over the next 5 years? Where would you really hope that this will go?

Peter - I think scientists studying the Biobank resource over the next 5 years will uncover a whole lot of novel really key insights about human biology and human disease, and some of those will have immediate impact in terms of the way we treat patients. Others of them will lead to new ways of developing drugs, new ways of choosing treatments in particular situations. Its impact on human health and healthcare will be enormous I think.

Kat - Peter Donnelly from UK Biobank. And if you’re a scientific researcher and you’d like to get your hands on the Biobank data, you can apply through the website - that’s

Add a comment