Genes and genomes
It's now over a decade since the human genome was first sequenced, costing over a billion pounds and taking 13 years. Today, we're close to the thousand dollar genome. But what's in a genome, and what can it tell us about our risk of disease? Plus we'll be arguing the age of polar bears, and finding out about fish with skin cancer. And our gene of the month is one for the Trekkies out there - it's Tribbles.
In this episode
01:10 - Genes, genomes and junk DNA
Genes, genomes and junk DNA
with Dr Tim Hubbard, Wellcome Trust Sanger Institute
Dr. Hubbard:: Nowadays, most of what we're doing here is re-sequencing. So we have the reference sequence of human and some of the other vertebrates, and we're interested in looking at how the sequence varies across large numbers of individuals. And so, when we sequence another individual, for most of them their sequences going to be the same as the reference sequence and so it's a question of comparison and spotting the differences. And so, that's how you deal with a new genome.
Now, there will be unique parts of that. There's a process is called assembly where you basically put these fragments together just like the bricks on the wall. They overlap with each other and you hope that they fit together uniquely so you can make a unique sequence. As the length of the fragments gets longer that becomes easier to build a unique assembly.
Kat:: So you get all these fragments out, you assemble them all together, you line them all up and you get a whole genome. How many different species have had their whole genomes read in this way?
Dr. Hubbard:: I've lost count. We have in the ENSEMBL database around 50 vertebrates but there are projects being planned to sequence, say, 10,000 vertebrates. I'm only talking here about vertebrates. There's many, many more genomes of pathogens, of bacteria. Basically, it's become so cheap to sequence now, so there's enormous amounts of sequencing going on. It's still quite hard though to make the first version of any particular species. It's much harder to build a reference.
Kat:: So you've got this genome, this book of letters that makes up an organism's DNA. That's very nice. I'm sure it's a nice thing to have. The real challenge I guess is understanding it. How do you figure out what's in there? What makes this species the species it is?
Dr. Hubbard:: Once you have an assembly, the first thing you're interested in is the genes. The genes are the fragments that genome that are expressed to make physical proteins which is a part of every cell and mechanisms for regulating those genes turning on and off. Say for human, how many genes are there? So for quite a while, we thought the main definition of a gene was something that makes a protein and original estimates were that there might be as many as 100,000. This is all from year 2000 kind of analysis and gradually, that number came down.
The shock analysis at the beginning saying that there's only 30,000, such it come down now to be more like around 20,000. It looked like a small set of genes but recently, it's become clear that there's other classes of genes that don't make proteins that just make RNAs.
So, the process of using a gene is, there's a DNA sequence, you make a copy of a region of that sequence, that's RNA, and then you translate that and manufacture protein. There's a significant set of genes where you never make the protein. You just make the RNA and then the RNA itself is a functional unit. Sometimes part of regulation, in fact, sometimes there's some sort of structural function. But there seems to be a lot more of those than we thought a few years ago. Exactly how many there are, it's actually quite hard - because they are harder to identify than proteins as there's a signature that makes it relatively easy to identify protein sequences. RNA is rather more difficult.
Kat:: I guess these are kind of the regions of the genome that were almost dismissed as junk DNA before. We couldn't figure out what they did. Do you think the concept of junk DNA still exists? Do you think any parts of the genome are junk or will we find that they are all functional?
Dr. Hubbard:: The definition of junk kind of came with whether a region of the genome was conserved or not, and that conservation is kind of based on comparing sequences between species. Is this region similar in mouse or similar in zebra fishes, similar in chimpanzee or similar in other humans? We've got better at doing that, but similarity is still based around proteins.
It may be that there's other things that are conserved such as the spacing between elements on the genome. It's conservation but you don't score it in the same sort of way. And it's suddenly clear now that we can use this DNA technology to find the whole transcriptome, everything that's transcribed in a genome, we can also use it as another form of assay to spot a whole other layer of modification of the genome, which is the epigenome.
It's a kind of modification layer on top, which seems to at least capture the state of a cell. And so in that sense, most of the DNA is in some sort of state for the epigenome and so you could say, well, how much junk is really left? How much is there that could be classed as junk? Because in fact, most of it seems to be labelled in some sort of way.
Kat:: And we'll be covering epigenetics in our future edition of the Naked Genetics podcast and explore that a little bit more. But in terms of all the data that we have now, this data about the genes that are active, that are transcribed, the genes in an organism's genome, all these species, all these pathogens, how do you organize it, how do you store it and how do you make it available to researchers?
Dr. Hubbard:: There's pretty much a policy that's come out - partly from the human genome project and the other projects around at that time - of releasing that data immediately. There's been a database for DNA for 30 years now, I think, and it's become policy that when you sequence and publish, it is expected that the data is deposited.
But that's a huge pile of raw data and that's rather hard for researchers to use and so databases have grown up to organize that data. And as the whole genomes have been sequenced, the whole genome has become the organizing principle for DNA.
And so, these projects like ENSEMBL have become a place where you can go and see a genome, and everything that's known about its annotation, about where the genes are, where factors bind, regulatory patterns. In the beginning ENSEMBL was based on vertebrates but because data is now being collected around different cell types, you're beginning to be able to display that information as well about the epigenetic state of different cell types as well.
Kat:: That was Tim Hubbard, from the Wellcome Trust Sanger Institute.
08:38 - Arguing the age of polar bears
Arguing the age of polar bears
with Safia Danovi
Dr. Danovi:: The first story comes from a scientist at Pennsylvania State University and the University of Buffalo in the US. And it's basically about when in history polar bears started evolving out from other types of bear.
Kat:: Because we covered this couple of months ago and there had been some new research that have suggested that maybe polar bears and brown bears split about 600,000 years ago. But actually, this new data is now putting this much, much earlier.
Dr. Danovi:: Absolutely, I mean what a difference a few months make. So at the moment, they're saying this current data is suggesting that polar bears and brown bears split about 4 to 5 million years ago so it's a massive, massive difference. And I suppose the difference is just down to the techniques that they have used. This new study has done whole genome sequencing of the nuclear genome whereas the last study just looked at various points in the genome. So I guess the devil is really in the detail.
Kat:: And what's quite interesting as well, is the people are looking back at what the earth was like 4 to 5 million years ago and saying, well you know, it's obvious you can see about 600,00 years ago there were ice floes and the polar bears had separated geographically. But much further back than that, brown bears and polar bears were co-existing so actually it throws up the idea that maybe there was hybridisation, there was breeding between them and the situation is a lot more murky when we look at the evolution of these bears.
Dr. Danovi:: Absolutely and I think that the data is being contested by another group based in Frankfurt in Germany who are arguing that the ice sheets weren't around 4 to 5 million years ago. So what was the selective pressure that caused polar bears to split? So I suppose this isn't the end of the story. But seriously, who knew that polar bears can be so controversial?
Kat:: And some people are saying, maybe as the ice sheets are vanishing, we'll see polar bears coming back onto the land mass. Maybe could they start to breed with land bears again?
Dr. Danovi:: We might be seeing the emergence of a beige bear.
Kat:: Certainly an interesting thought.
10:32 - Tooth stem cells
Tooth stem cells
with Safia Danovi
Kat:: Now, the other story that I noticed this month was from Emma Juuri and her team at the Institute of Biotechnology in Helsinki and they published this in Developmental Cell. They've been studying teeth, particularly in mice, and they found a particular transcription factor called Sox2 is expressed in stem cells in the front teeth of mice. And this is quite interesting because the front teeth of mice actually grow throughout their life. But in humans obviously, our teeth stop growing so the hunt is on to find what's driving the stem cells to make teeth in mice and maybe we could use this to regrow teeth in humans. What do you reckon about this story?
Dr. Danovi:: I think this is fantastic news. I'm speaking as someone who has absolutely awful teeth and would be very, very pleased to have the ability to grow more teeth from scratch.
But on a serious note, I think it would be really interesting to know if Sox2 is doing exactly the same thing in humans and we know that the Sox family are a very bossy family of genes. They like telling cells exactly what to do and most famous example being Sox9 which tells the developing embryo to become a boy. You know, I wait with interest on this story and I think it's going to become commercially quite important.
11:37 - Fish with cancer
Fish with cancer
with Safia Danovi
Kat:: And now, moving from mice with teeth, we're moving to fish with cancer. What's this final story?
Dr. Danovi:: This is actually a really troubling story in that they found melanoma for the first time in a fish population. This is a wild fish population off the Australian coast out by the Great Barrier Reef and it's worrying because it is a commercially important species of fish. This is the coral trout and it's actually the first time that they found melanoma in fish. So it's interesting and what's particularly noteworthy, is that this part of the coast lies directly beneath the largest hole in the ozone layer.
Kat:: So they think that actually the loss of ozone is pretty much causing over-exposure by UV to these fish and it's causing their cancers?
Dr. Danovi:: Yes, absolutely, I think they've ruled out sort of microbial pathogens and pollutions so at the moment, UV seems to be the most likely cause. I think they are going to have to prove that experimentally but they also did show that these melanomas that they found in these fish were quite similar to the melanomas found in humans. So, it wouldn't be surprising if UV was the main cause.
Kat:: But how widespread is this? I mean melanoma in humans isn't that common a cancer. What sort of numbers of fish were getting this disease?
Dr. Danovi:: So, they sampled just under 140 fish and they found around 15% of them had these melanoma lesions but of course, they suspect that the proportion of fish in the wild with melanoma is probably going to be much higher because, obviously, seriously ill fish are probably going to retreat and hide at the bottom of the ocean where they're less likely to be caught and sampled, so we have no idea how widespread this problem is.
Kat:: So what do they plan to do next now that they've found this out?
Dr. Danovi:: I guess their next step would be first to look at the melanomas and characterise them and then of course, it begs the question: Are there any of the species of fish that are being plagued by this cancer type and that's actually quite worrying.
Kat:: It's certainly some food for thought.
13:42 - Revitalising heart cells
Revitalising heart cells
A team at the Heart Institute at San Diego State University have discovered that damaged heart tissue from older patients with heart failure can be rejuvenated by modified stem cells taken from their own hearts, publishing their findings in the Journal of the American College of Cardiology.
The scientists took samples of cardiac stem cells from elderly patients and added a protein called PIM-1, which helps to promote cell survival and growth. They found that the telomeres in the cells - the caps on the ends of chromosomes - started to lengthen, effectively turning back the genetic clock and making the cells younger. So far, the researchers have tested modified heart stem cells in mice and pigs, and found new heart tissue growth in just a few weeks, opening the door to potential tissue engineering for people with heart failure in the future.
14:27 - New epilepsy gene
New epilepsy gene
Writing in the journal Nature Medicine this month, Dr Eva Jimenez-Mateos and her team of neuroscientists at the Royal College of Surgeons in Ireland have tracked down a new gene involved in epilepsy. Unlike most genes that encode proteins, the new gene makes a microRNA, known as microRNA-134, which is present in much higher levels in the part of the brain that causes epileptic seizures.
The team then used a new type of molecule known as an "antagomir" to remove that particular microRNA from brain cells, and found it prevented seizures for up to a month. The researchers hope their discovery could one day bring hope to people with epilepsy whose condition can't be controlled effectively by medication.
15:08 - Making eggs in adults
Making eggs in adults
A new study in PLoS Genetics has thrown fuel onto a controversial debate in the field of fertility research - the question of whether mammalian females, including women, can make new eggs cells after birth or not. Many researchers believe that egg cells are only formed when a female foetus develops in the womb, and no more eggs are made after it's born.
But when scientists from Massachusetts General Hospital and the University of Edinburgh reassessed data from a study in mice published in the journal back in February, they came to the surprising conclusion that there was evidence to suggest that egg cells could divide after birth. Although there's a lot more work to be done to prove it - and it's unknown whether this phenomenon also occurs in humans - it raises the intriguing possibility that adult females may be able to make new egg cells as they age.
15:52 - New childhood disease genes found
New childhood disease genes found
A pair of studies published in the journal Nature Genetics unveil gene faults lying behind two rare but debilitating childhood diseases. In the first, researchers at Duke University Medical Centre in the US found the gene mutation responsible for alternating hemiplegia of childhood, or AHC - a condition that causes paralysis of alternate sides of the body, as well as seizures and learning difficulties.
The condition isn't hereditary, so the scientists had to scour the genomes of just seven patients with AHC, comparing them with their parents until they found a gene fault common to all seven. They then went on to show that the mutation, in a gene called ATP1A3, was also present in over three-quarters of AHC patients around the world. While it's a long way from a cure, the scientists hope their finding will increase awareness of the disease and help doctors to make an accurate diagnosis.
The second gene to be tracked down lies behind Leber congenital amaurosis, or LCA - a rare form of blindness that sets in during infancy. After nearly a decade of hunting, the team at the Ocular Genomics Institute in Massachusetts led by Eric Pierce found that faults in a gene called NMNAT1 could cause the disease, bringing the total number of identified LCA mutations to 18.
Unlike other previously identified LCA gene faults, which are in genes involved in light sensing, NMNAT1 helps cells to make energy. They hope that this new discovery could pave the way for interventions to slow or even completely prevent the onset of blindness in these children.
17:42 - Genes and disease
Genes and disease
with Dr Carl Anderson, Wellcome Trust Sanger Institute
Dr. Anderson:: So, I don't really like the term "gene for" because we all have that gene and that gene exists in our genome to encode a protein which has a certain function and the function of that protein is critical or is useful to everyday human life. That's why the gene is there, that's why the protein is made and that's why the gene exists.
What happens in certain individuals who have particular genetic conditions, and where this term "gene for" comes from, is that there are specific mutations which occur in that gene and it's those mutations in that gene that change the function of the gene. They change the way that protein functions, the abundancy with the proteins present in our system, the folding of the protein that makes some subtle change to that protein that then changes our risk.
So it's not true that this is a gene "for" breast cancer, let's say. That's not why that gene exists. It's not present in our genome to cause breast cancer. It has some other function that when there's a particular mutation in that gene that then increases risks for breast cancer, whatever the disease happens to be.
Kat:: So, we should start talking about a "mutation for" breast cancer or heart disease rather than a "gene for"?
Dr. Anderson:: Exactly because we all have that gene and quite likely, if we didn't have that gene, you'd be in quite a lot of trouble.
Kat:: And what sort of diseases would you class as a single gene disease and what sort of diseases are these multi-factorial diseases?
Dr. Anderson:: So the single gene diseases are things like cystic fibrosis, Huntington's disease. They're the type of diseases that actually, when you study genetics at school then these are the type of diseases, which you get taught. So, as genetic diseases, they are the ones where you have the pedigree diagrams and you look to see whether mum has the disease or dad has the disease and how many of the children have the disease and you could see a very clear, inheritance pattern. These are the classic Mendelian diseases, single gene diseases.
The more complex diseases are probably the once that have a greater frequency in the population so things like autoimmune diseases, type 1 diabetes, Crohn's disease, Coeliac disease and things like most cancers, things like hypertension. And not just diseases you know, there are genetic influences on lots of human traits so your height, your weight, your overall body shape, all these human traits have genetic influences.
And assessing how important genetics is to each of these traits is a complex thing to do and as one of the jobs of the human geneticist. And once we've found out that genetics has a large role to play in some of these diseases, our next job is to go on and identify those specific regions of the genome that look to be underpinning that genetic risk.
Kat:: So how on earth do you do that? How do you pin down region of the genome or a particular gene and say that's involved in heart disease, that's involved in schizophrenia?
Dr. Anderson:: It's surprisingly simple actually so basically, what we do, is we take a whole bunch of people who have the disease and a whole bunch of people who don't have the disease. So how big is a bunch? Well, it really depends on the size of the genetic effect that you're trying to find. So if you think, the particular disease you're working on is likely to have genetic effects which are a relatively small effect, let's say, then you're going to need lots of people to do this experiment in.
So in the last five years or so, we started doing this genome wide association studies and so, what typically happens in a genome wide association study is we take say, 3,000 people who have the disease and we compare them to 3,000 people who don't have the disease. And we genotype them and a whole bunch of just various positions throughout the genome. And so we end up having, say, half a million or a million sites throughout each of these people's genomes and then we just look to see if any of these particular sites, there's a difference between the people who do have the disease and the people who don't have the disease.
And then if we find a site where there is difference, which look in the reference databases that we have to find whereabouts in the genome that you know, that particular site is, what genes lie in the area and does that actually tells them something about that particular disease.
Kat:: Presumably, you're not looking at these millions of different variations by eye with pencil.
Dr. Anderson:: No.
Kat:: How do you analyse this kind of data?
Dr. Anderson:: So, the type of science that we do at Sanger and these genome wide association is extremely high-throughput, in the sense you've got lots of genetic data across many thousands of individuals. So the data sets are very large so we have to use computational tools basically to go in and conduct statistical tests at each one of these sites. And then when we're analysing the results of that, we have to bear in mind how many tests we've performed to try and make sure that obviously, you do many tests so the chance of you finding a false positive is quite high. So you need to control for all that so basically, we use the computers to allow us to analyse all these data very quickly and efficiently. And we use statistical methods to try and pull out from all those many thousands and thousands of tests the few interesting sites that remain.
Kat:: Where do you think this kind of research is going to be heading in the future?
Dr. Anderson:: We've been relatively limited to the amount of a particular person's genome that we can survey. So before, we've been just genotyping specific sites in the genome perhaps a million sites and using those one million sites throughout the genome to try and infer what's going on throughout the whole genome. Now, with the advent of next generation sequencing, we can get hold of virtually every single base in any particular person's genome. So we've got much more thorough coverage of one person's genome. And so, this really increases the power of our studies.
Also, with the falling costs of those technologies, we can start to do that very thorough survey on many thousands and thousands of people. And so I think, what this is going to allow us to do is it will basically allow us to survey more of the genetic architecture of diseases. Before, we were limited to genetic variants, which were perhaps quite common in the population whereas now, we can actually start to survey down to genetic variants which are perhaps very rare in the population. And indeed, we can probably even get those specific variants which are unique to any one individual person.
Kat:: Do you think that one day in the not too distant future, when the baby is born it will have some blood taken and it will have its genome sequenced?
Dr. Anderson:: I think that's very extremely likely, I think the potential medical benefits are quite huge, and I really think that we'll be there soon.
Kat:: That was Carl Anderson from the Wellcome Trust Sanger Centre.
Why are children all so different?
Dr. Zegerman:: So to understand real individuality really have to understand how you and I were made. It really starts when a sperm from your dad and an egg from your mum got together and made you as an individual. Sperm and egg are made by a special form of cell division, which is called meiosis. Meiosis is special because it turns a diploid, which has two copies of every chromosome, into a haploid, which has one copy of every chromosome. So, when cells goes through the process of meiosis, you take this 46 human chromosomes and you turn it to a cell with one copy of every chromosome, which now means has only 23 chromosomes.
And very importantly, this process of meiosis is random. Every cell that goes through meiosis will inherit one of every chromosome, but whether inherits one from your mother or father is random. So for example, when your father's cells went through meiosis to make sperm, it was random whether that you inherited chromosome 2 let's say, from his mother or chromosome 2 from his father. So now, you can see that when you get an egg and a sperm fusing together, as happened for you and your brothers and sisters, you now see this is now a random collection of chromosomes that aren't just from your parents but random collection of chromosomes they inherited themselves from their parents.
But it's actually much more complicated than just random assortment. During the process of meiosis, chromosome 2 will align next to chromosome 2 and chromosome 4 will align next to chromosome 4 and so on and so forth. And this process is absolutely essential to ensure that each haploid sperm or egg inherits exactly one copy of all the different chromosomes. More than just being essential for the inheritance of every single chromosome, it also allows a very special process to occur whereby chromosomes that are similar but not identical that line up can now exchange pieces of DNA. And this results in making completely unique chromosomes. So now you've swapped a bit of chromosome 2 that your father inherited from his mum with a bit of chromosome 2 that he inherited from his dad. And the end result is a chromosome 2 that is neither your dad's, nor is it your grandma's nor is it your grandfather's. It's completely unique. It's got a bit of both grandma's and granddad's and that is the chromosome that your sperm that made you inherited. It's completely unique to that sperm.
28:04 - Gene of the month - Tribbles
Gene of the month - Tribbles
with Kat Arney
Our gene of the month may be familiar to any Star Trek fans who are listening - it's none other than Tribbles.
It was first discovered in 2000 by Thomas Seher and Maria Leptin, who found that cells in fruit flies with mutations in the gene multiply uncontrollably, much like the furry critters in Star Trek. Since then, Tribbles has turned up in many other organisms, including humans, where there's a large family of related proteins that help to control fatty acid production, insulin resistance, cholesterol levels and more.
Unsurprisingly, some of these Tribbles proteins have been implicated in conditions such as type 2 diabetes and the formation of atherosclerotic plaques - the fatty deposits that can block blood vessels and cause heart attacks. Unusually high levels of Tribbles have also been found in leukaemia cells, which fits with the protein's other role in driving cell growth, although I should point out that it doesn't make the cancer cells furry.