Mutated 'junk' DNA could lead to rare health problems

The hidden splicing behind the scenes...

02 December 2024

Interview with

Ernest Turro, Icahn School of Medicine at Mount Sinai

Part of the show Debunking 'junk' DNA

DNA SPLICE.jpg

Credit:

CC0 Via Pixabay

Play Download

For the most part on this programme, we look at the DNA that codes for stuff, the bit of our genome that creates real tangible proteins or what have you, that go on to change our body in some way. But do you know what percentage of our entire genome, everything stored in our DNA, actually directly codes for stuff? It’s about 1%. So what on Earth is the other 99% doing? Well, for a long time, we had a simple answer: nothing. It was ‘junk DNA’, remnants of bygone protein recipes that we no longer have any need for, or stuff that came in via disease or other means that we have nothing to do with. And for many decades, this opinion persisted. Now however, we realise that a lot of what goes on behind the scenes in our DNA is heavily controlled and moderated by these non-coding proteins. So today, let's have a peek behind that curtain, and debunk the junk.

Starting with the idea that certain non-coding DNA is responsible for part of the splicing process of specific genes. Say you want to take one recipe out of a cookbook, you need to know where to start and where to end looking at the recipe, so you don’t accidentally add cake ingredients to a pasta. The presence of this non-coding dna as a means of selecting the right bit of genome to make into a protein can mean very rare health problems present themselves if this process goes wrong. I’ve been speaking to Mount Sinai’s Ernest Turro, author of a new publication on just such a disease, and he began by giving me a rundown on how attitudes towards junk DNA have had such a pivot in recent years…

Ernest - The term 'junk DNA' was coined over 50 years ago by an evolutionary biologist, the same year that the term junk food was popularised by nutritionists. And just as junk food had no known nutritional value, junk DNA was used to refer to regions of the genome without a known function. At the time, that would've encompassed almost the entire genome. But over the years, it's become apparent that actually quite a lot of the genome is functional. About 10 years ago, colleagues in Cambridge published a paper pointing to a very large proportion of the genome being biochemically active. I think the figure was 80%, and that that statement actually stirred up a bit of controversy at the time. And I think there's no real consensus on what exactly should be considered junk DNA. So I think the term has fallen considerably out of favor in recent years. I think it might be more useful to think of the genome as containing a vast array of elements of different kinds. So some of those elements are genes. My old Cambridge boss, Willem Ouwehand, would liken them to lights in a room. And other elements are more akin to regulators or, or dimmers, uh, for those lights. And I think genes are certainly the most important of the elements. And they come in two broad flavours. So you have coding genes and non-coding genes. And there are about 20,000 coding genes, which are the stretch of genome that is transcribed into RNA molecules, which are then translated into proteins. So in the case of coding genes, the RNA's purpose is really only to mediate the production of proteins from that gene. But there are also tens of thousands of non-coding genes. And these are transcribed into RNA, but it stops there. The RNA is not translated into a protein. My research group specialises in the development and the application of statistical methods to discover the genetic changes or variants in human genomes that are responsible for rare diseases. And these disease causing changes, they might happen in regions of the genome that previously would've been considered to be junk DNA.

Will - Given that your area of research is based around rare diseases, what do you think these areas of non-coding genes, that's what we should probably refer to them as, that you looked at are actually responsible for?

Ernest - So our publication in May in Nature Medicine concerns a non-coding gene called RNU4-2. We found that genetic variants in this gene were responsible for a neurodevelopmental disorder that we estimate affect tens of thousands of people around the world. In 2021, Daniel Green and my group completed an initial genetic association study of all the genes, so both coding and non-coding, across all the rare diseases, in this very impressive genomics England dataset. We identified hundreds of associations with coding genes and also a few very promising associations with non-coding genes. And so we decided to write up a paper on the coding genes first, which appeared in Nature Medicine last year before moving to the non-coding genes. And when we returned to our analysis of non-coding genes, we immediately focused on this really small non-coding gene called RNU4-2. This very small non-coding gene, its only 141 bases long contains variants which are responsible for this neurodevelopmental disorder, which is now called RNU4-2 syndrome or ReNU syndrome. The syndrome features, you know, intellectual disability and other traits such as short stature, small heads, seizures. And what's really striking about this is that we estimate it affects about 1 in 25,000 young people. And this would make it amongst the most common monogenic causes of neurodevelopmental disorders known to date.

Will - How could, though, the mutation of a non-coding gene lead to the formation of a neurodevelopmental disorder?

Ernest - That's a really good question. The RNA that this gene encodes is called U4 and it is one of the five small RNA components of a molecular complex called a spliceosome. So spliceosomes, as their name suggests, play a crucial role in splicing exons together within the nucleus. So exons are like linear islands in an archipelago. They are separated regions of the genome, which are transcribed into RNA and then they're spliced together to form a mature RNA molecule, which can then be translated into a protein. So there are two types of spliceosome in humans, there's the major spliceosome and there's a minor spliceosome. And in fact, defects in two small non-coding RNAs, which are in the minor spliceosome, have been previously tied to a rare disease. Now, these RNU4-2 mutations that we published earlier this year represent the first reported non-coding defect in the major spliceosome.

Will - So just so I've got this straight, the idea that these non-coding genes are responsible for splicing coding genes, but they may cut it at the wrong place and therefore the recombination leads to, say, an incorrect protein which could lead to a neurodevelopmental disorder.

Ernest - That's exactly it, that's the running hypothesis. We just haven't really been able to find a large-scale defect in this slicing process in mature cells that we've studied from patients. And that's why this kind of lab work where you take cells and differentiate them into different types of cells that we can't study from patients and look at the splicing and those cells could shed some light into where the splicing problems occur and which cell types this variant splicing occurs.

Will - It seems extraordinary that we've spent so long ignoring this huge other part of our genome. And I guess the mind boggles at how many other neurodevelopmental but also physical disorders could be down to errors going on in non-coding DNA that we've previously ignored.

Ernest - That's right. I mean, the cost of sequencing was such that there was a case to be made for only sequencing the protein coding genes initially. If you can sequence fifty patients instead of one by restricting the regions of the genome that you sequence to the protein coding genes, then that might initially lead to more scientific findings than if you sequence the entire genome. Because it's reasonable to suspect that many, if not most, of the variants that cause diseases are located in protein coding genes. But by the late 2010s, the cost of sequencing a whole genome had gone down quite drastically. And so that made it much more attractive to sequence entire genomes in large numbers of patients. And that is what's allowed us to discover that indeed some non-coding genes can cause mutations that can cause really quite severe diseases in humans.

Will - I'm envisaging a future where this sort of research looking more and more into non-coding DNA could go hand in hand with better earlier diagnoses of neurodevelopmental disorders, such as the ones you found, perhaps autism. Do you see that as being a potential future?

Ernest - Absolutely. One of the things that has happened as a consequence of this discovery is that all the big clinical genetic testing labs are scrambling to adapt their tests so that they can pick up sequence variants in this small gene. And one of the issues with testing that restricts to specific regions of the genome is that they need to be adapted regularly every time that there's a new discovery. If it's in the non-coding parts of the genome, then the test will probably need to be adapted.