Titans of Science: David Baker
Our Titans of Science season continues with the man who used AI to create an unprecedented number of custom proteins: Nobel Prize winning biochemist David Baker…
In this episode
01:06 - David Baker: Why do proteins matter?
David Baker: Why do proteins matter?
David Baker
In this edition of Titans of Science, Chris Smith chats to David Baker, the Nobel Prize winner who used AI to design custom proteins...
Chris - David Baker was born to two parents who were themselves, both scientists, in Seattle on the 6th of October, 1962. He attended Garfield High School in the city before he read biology at Harvard University. On graduation, he then began working on how proteins are transported around cells. Later, he would go on to pioneer methods to design proteins and predict their three dimensional structures, and that helped to earn him a share of the 2024 Nobel Prize in chemistry. David co-founded several biotechnology companies and he was included in Time Magazines inaugural list of the 100 most influential people in health. He's now the director of the University of Washington's Institute for Protein Design. Welcome to the show, David, and congratulations on your Nobel Prize. How did your interest in science get started?
David - Well, perhaps surprisingly, I really became interested in science relatively late. I'm not one of those people who's fascinated by science from an early age. In fact, when I got to the university, I initially declared my major to be social studies, and later I got interested in philosophy. And it was really not until my last year of college that I decided to switch to science and focus on biology.
Chris - But proteins, I mean, I remember my biology teacher at school. I wasn't very old obsessing about proteins. I really didn't get what all the fuss was about. Why did they matter to you? What drew you into that and why do they matter to anybody?
David - Well, in fact, at the time I was in University, I had no idea what proteins were either until I took a biology class. And it seemed interesting to me, but it wasn't until quite a few years later that I really started becoming obsessed. And I'll tell you why. In nature, you know, biological organisms, animals, humans do all kinds of really, really amazing things. And if you look in detail how those things are accomplished, at the heart of everything are proteins. So there are specialised proteins that mediate the electric currents through our brains while we're thinking and talking. There are proteins that allow us to move around. There are proteins that enable plants to capture solar energy from the sun and use it to make molecules. Basically everything that goes on in biology is done by proteins. So the way that biology works is there are all these different jobs, thousands and thousands of jobs in any organism. And for each job there's a specific protein. So you can kind of think of proteins as the miniature machines, which do all the important things in life.
Chris - And what's their structure? How would I recognise one?
David - You wouldn't recognise one if you saw it because they're extremely small. They're just one nanometre across, which means that you need a trillion of them to get to a metre. But each protein has a very well-defined shape. That's one of the really kind of miraculous things about proteins. If you think about the machines that we're used to encountering in real life, each machine has a very defined shape, which is really important for it to do its job. Like a car has wheels so it can roll, and it's got an interior compartment you can get into. And in the same way every protein has a very defined shape. And those shapes are really what lets the proteins do what they do.
Chris - How do they come by that shape though?
David - Proteins carry out all the work in our bodies and all living things. Like I said, the instructions for making proteins are in our genomes in the DNA. And so the DNA in our genomes specifies what the chemical structure of each protein is. That is what the sequence of amino acids is going to be. Proteins are made out of amino acids. There are 20 different types of amino acids, and a protein is a linear chain of about 100 to 500 amino acids. And the sequence of amino acids of a protein completely determines what its shape is. And that sequence, as I said, is specified in the genes in our genomes.
Chris - And those amino acids are all different chemically. So they can have different shapes, different structures, different sizes, different electrical behaviours. And so that means depending upon which ones you slot in, it's a bit like building a wall with different shaped bricks. You are going to get a different shaped wall or a different coloured wall or a wall with interesting properties if you put different amino acids into it.
David - Yes, that's a very good analogy. It's kind of like having a universal building block kit, kind of like a child's construction toy where you can basically make any shape by having the right combination of amino acids. And it's not just the shape, the machine also has to interact with whatever it's operating on in the right way. So by having certain types of amino acids on the surface of a protein that will enable it to interact with other proteins or with DNA and carry out the job it's meant to do in biology.
06:02 - David Baker: Hijacking screensavers to design proteins
David Baker: Hijacking screensavers to design proteins
David Baker
In this edition of Titans of Science, Chris Smith chats to David Baker, the Nobel Prize winner who used AI to design custom proteins...
Chris - By the time you were obviously doing your studies and getting into these sorts of questions, it had been nearly half a century since Watson and Crick worked out what DNA did, and that it was storing the code that told cells how to make these magical things, proteins that were gonna go on to dominate your life. So what was it at the time that intrigued you? What was the unknown or the unanswered that you wanted to explore further?
David - There was a very important observation made in the 1970s or late 1960s, which was that if you took a protein and you pulled it apart, it would pull back up to the same shape. And that was what really proved that the shape of a protein was determined by its sequence of amino acids. But nobody knew how that worked, how that code going from amino acid sequence to three dimensional structure worked. And no one really understood what the process of this folding up process was. And I became fascinated about that when I joined the University of Washington as a professor many years ago. I think I found it fascinating because it's kind of the simplest case of self-organisation in biology. So if you think about all the non-living things around us they're not really organised. They're kind of all random. But when you look at your pet, a dog or a cat, or your brother or sister, you know, animals and plants are really highly organised. That's how they're different from the rest of the world. And proteins are really the simplest case because proteins are made out of thousands of atoms, and you would expect them just to be completely random jumbles of different shapes. But yet they just have one shape. And so I was really fascinated about it from that point of view. And also, like I said, proteins carry out all the important jobs in living things. And so I thought if we could understand how proteins fold up, we might be able to actually make new proteins at some point, which could then have a huge number of possibilities.
Chris - So we would think, 'well, I want to have a miniature machine at the scale of, of atoms capable of doing job X. And I think it would have to have shape Y to do that, and therefore I would design a bespoke protein that would do that.' That was sort of the end goal you had in mind at the time.
David - Yes, exactly.
Chris - Well what was stopping you doing that?
David - Well, it's very complicated, the process of going from an amino acid sequence to three-dimensional structure or three-dimensional shape, or going backwards from a three-dimensional shape to a sequence of amino acids that will encode it. As I said, proteins have many thousands of atoms. And so the way we started working on this problem was to try and model all of the interactions between all of those atoms. So to think about the process of protein folding up, we sort of develop methods for actually modelling that holding up of the very long protein chain, and guided by all these thousands of interactions between atoms. And we were able to make some progress on that problem. And then we realised we could go backwards and take a brand new shape and go backwards to figure out what combination of atoms and what amino acids would have the property that they would actually fold up to that shape. We were actually able to do that about 20 years ago and design a new amino acid sequence that folded up to a new shape. And that kind of really opened the door to a lot of possibilities because once we could make new shapes completely from scratch, it seemed like we should be able to design new functions. Given any job that you might want a protein to do, we should be able to design a protein to do it.
Chris - Is it tricky to do this sort of thing because all these atoms are different. They're different sizes, different masses, different electrical charges. And that's going to mean that if you just had two atoms to consider A and B and you could quite easily understand how they would probably stick to each other or want to attract or repel each other. Once you've got thousands of them and they're all doing their own thing independently, you've got thousands and thousands of possibilities to consider. Is that why it is a difficult problem to solve?
David - Yes. That's one of the reasons, because there's so many of them and there's so many different possible combinations. The other thing that makes it challenging is we don't know exactly the details of how those atoms or amino acids interact with each other. And so for that reason, there are errors when we do these calculations, which have to be done on a computer because there's so many atoms. And so there's so many different possible places each atom could be, and the interactions between the atoms, we kept improving our description of those interactions on our ability to try out more and more different combinations of atoms. And that's why we were able to make progress in designing more complicated proteins.
Chris - It does sound, though, like something that would be more tractable with heavy duty computing, because you can ask a computer to consider all of those different possibilities, and it may take a while, but wasn't that what we built supercomputers to be able to do loads and loads of calculations and model this sort of thing?
David - That is exactly what we reasoned. So in fact we started a distributed computing project called Rosetta@home where we enlisted people all over the world who had computers at home and who were running screensavers to run a screensaver that would do these calculations. It would both predict protein structures and it would design brand new structures, and we were able to enlist quite a few volunteers. And actually Rosetta@home became equivalent to a medium sized supercomputer. And so that, I think community involvement in our science has always been really important. Rosetta@home then led to, we had a screensaver, so you could see the protein getting designed or folding up on the computer screen, and people would watch the screensaver and they would write to us and say, 'you know, it's really cool, but I think I could do better.' And that led us to develop an interactive game called Foldit, where the participant could not only have their computer fold the protein, but they could get in and actually guide it.
Chris - So this is sort of like the biochemical equivalent of what went on to become the SETI@home concept, wasn't it? People downloading bits of data to their home computer, which could then be crunched on their computer, their electricity bill, very crafty David. And that led to a resolution or an analysis of a piece of data that when brought back together, you had that enormous wealth of computing power by harnessing thousands of computers around the world.
David - Yes, that's exactly right. In fact we didn't have to build up the infrastructure for this because the SETI@home group and developers were incredibly generous and they actually helped us to use their entire platform and infrastructure for connecting many, many, many personal computers together to do these calculations. Except the difference was, rather than processing signals, radio signals, the participants in our project were folding and designing proteins.
Chris - What were you sending to the users and what were they when they were first doing this before you got onto the game that you developed, that you just mentioned, but when you were just showing them data that was being crunched, what were they picking up from your server? What was their computer doing? What was it showing them that some people were then latching onto and saying, I think I know how to improve on this?
David - Well, when we were trying to figure out how a protein folded up, as we said, there are many, many different shapes that any protein could have. So we would send out just the amino acid sequence of the protein, and then the participant's computer would fold it up and it would send us back the folded structure. And what we get back from this is hundreds of thousands of different possibilities for how the protein could fold up, and from which we could identify those that were the most likely correct solution. And the principle we use to evaluate that is similar to expectation when you have a ball rolling on a bumpy surface, that it will eventually end up in its lowest elevation point. Well, similarly, as I said, we work in these calculations where we look at the interactions between all the atoms and we calculate the energy of each protein in that way. And so we select out those proteins those shapes for which the amino acid sequence had the lowest energy.
Chris - And what were the users seeing that enabled them to point out to you that they could probably help you out a bit more?
David - They saw the protein, the computer, trying out many, many different possible shapes and kind of jumping around a lot. And the participants would look at this and see that it looked like it was flailing around and not really going in the right direction a lot of the time. And that's really what led to Foldit.
Chris - And Foldit was them being able to say, well, hang on, let's actually manipulate that bit to there, this looks to me as the right way a protein should fold. But these weren't protein chemists, these people, these are just computer users at home, weren't they?
David - That's right. They were. And so an important part of Rosetta@home and Foldit, critically Foldit because people were going to do it themselves, was a series of pedagogical introductory levels where you would go through puzzles of increasing difficulty that were designed to teach you the principles of biochemistry.
15:36 - David Baker: How to win a Nobel Prize
David Baker: How to win a Nobel Prize
David Baker
In this edition of Titans of Science, Chris Smith chats to David Baker, the Nobel Prize winner who used AI to design custom proteins...
Chris - This was about the turn of the millennium, early 2000s, wasn't it, that you were doing this? So that got you a bit further along. But there was still clearly a gap because this didn't immediately revolutionise our ability to predict proteins, or you would've won the Nobel Prize a lot longer ago than you have. So what was still the stumbling block at that stage then?
David - Well, the stumbling block for both structure prediction and design were just the ones that I described, that proteins are very complicated and they're made out of many thousands of atoms. So really doing accurate calculations, it was really hard to get really accurate structure predictions, for example. On the design side, we were able to design more and more powerful proteins doing a wider and wider range of jobs, but we had to try a lot of different designs to find one that really worked well and solve the problem that we intended it to solve. So the real game changer was the advent of deep learning, and that was really demonstrated in a spectacular fashion by the DeepMind team, my co-laureates John Jumper and Demis Hassabis, who showed that the database of protein structures was sufficiently large that one could learn from it the rules of protein folding and go from an amino acid sequence directly to a three dimensional structure. So I have to tell you one thing though, just to put this in context. Before it was possible to predict the protein structure, the structure of a protein from its amino acid sequence, scientists around the world spent many, many years and actually still do determining the structures of proteins experimentally. That means figuring out where in space each atom of a protein is. And they do this in a number of ways. For example, one of the most powerful is shining x-rays at a crystal of the protein and figuring out how those x-rays scatter. And that gives you direct information on the position of atoms. Now, tens of thousands of scientists over 50 years at an expense of tens of billions of dollars or more, spent their careers determining the structures of proteins. And many scientists, great scientists, are continuing to solve the structures of more and more complex proteins. And so what this led to was a database of about 200,000 different protein structures, and each protein structure specifies exactly where each atom in that protein is relative to the others. So it's this incredibly rich storehouse of information. And what the DeepMind group showed is that this information store was sufficiently detailed and rich, that you could really learn the rules and predict structures of proteins from their sequence.
Chris - You feed in to the artificial intelligence all of that wealth of information where people have painstakingly worked out where the atoms are in three dimensional space in each of those proteins. So it can then learn. And that presumably means you can then feed it an unknown protein, an amino acid sequence. These are the building blocks of a protein you've never seen before. And it can apply the same rules to then work out what it would look like.
David - That's exactly right. So the program that the DeepMind group developed is called AlphaFold. AlphaFold was trained on all the amino acid sequences of proteins of known structure. It was trained to predict the structure. And so now you can give a new amino acid sequence to AlphaFold, and it will generate the predicted structure for it.
Chris - One of the things that the award committee said was that you achieved the almost impossible feat of making new proteins. So this was essentially upstream of what we've just said. You proved that you could make a new protein from scratch, you could come up with a concept and design it. And I suppose what the DeepMind team then did was to equip you with a way of doing that far faster.
David - Well, yes. So as I described, when we started designing proteins long before deep learning was even a well established field. And we used this sort of atomic description that I described earlier where we had to model all the interactions between pairs of atoms, and we used that approach to design completely new proteins. And that was what was cited by the Nobel Committee. That was back in 2003. After DeepMind showed that protein structure prediction could be greatly enhanced using deep learning. We naturally were very quickly moved to apply deep learning to protein design. And what we found is that we were able to develop very powerful methods for designing brand new proteins that were much better than the previous sort of methods based on this cloud of atoms I described earlier. And using these new design methods, we can design proteins that have a very wide range of different functions, and we have made these methods freely available to anyone in the world. And so it's very exciting now because we're seeing many different research groups designing new proteins using the deep learning methods we've developed. 10, 15 years ago the idea of trying to solve a problem in biotechnology or in sustainability with a design protein just sounded totally crazy on the lunatic fringe. But now there's really great interest in designing new proteins to solve problems in medicine, in sustainability and technology. So it's a very exciting time.
Chris - Could you, for example, to think about how we might deploy something like this. Could you say, 'well look, ocean and marine plastic pollution, that's a major headache. I want to design an enzyme that has never existed in nature. It's a protein that can attack plastic in the ocean and get rid of it.' Could we throw that sort of problem at this sort of solution now and begin to build protein machines that would do that sort of job for us?
David - That is exactly the type of problem that we're working on now. So there are several extremely talented researchers in my group who are working specifically on that to design catalysts that will break down plastic. We're also working on ways, new ways, to fix CO2 as well as new proteins that will very specifically target cancer cells in the body. So you can treat the cancer without systemic effects. It's an exciting time also because we have our first medicines that have been approved for use in humans. And that's a vaccine, a Covid vaccine developed by my colleague Neil King at the Institute for Protein Design here.
22:21 - David Baker: The future of protein production
David Baker: The future of protein production
In this edition of Titans of Science, Chris Smith chats to David Baker, the Nobel Prize winner who used AI to design custom proteins...
Chris - People often say that it gets interesting when things break or don't work. So when you do this, are there any things that trip up these artificial intelligences, things that they consistently get wrong and shouldn't. Because often there might be something interesting lurking in there. Have you noticed anything like that?
David - In fact, in every problem we work on, we only work on problems which are kind of at the cutting edge of what's possible. Because the really easy problems we figure people in other places could do with the software we're releasing. And whenever you work on a hard problem, you only understand about 30, 40% of what's going on. And so one of the things, the really key thing, is you start working on a problem like targeting a tumour or breaking down plastic. And the first few designs you make don't work or they don't work very well, and then you have to look at what's going on that's wrong. Then that gives you ideas on what you need to improve about your design strategy or the methods to really solve those problems. And so that's really largely what science is about, is having some hypothesis about how to solve a problem, trying to solve it, and then it doesn't work as well as you thought. And then trying to figure out what the basis for that is and improving your method and approach accordingly.
Chris - Some branches of science are also now going down the synthetic route where, when we began this conversation, you explained a protein is something made from one of a combination of 20 different amino acids, but we can as clever chemists now make amino acids that don't exist in nature. So we can therefore do chemistry that may not exist in nature. Can the artificial intelligences be brought to bear using these novel chemicals though, because of course we won't have that vast database of proteins that use these new chemicals we're creating to train on.
David - Exactly. So this is where the previous methods that were based on modelling all the interactions between the atoms still are very useful. So we're trying to do exactly what you described, build catalysts now that incorporate unnatural amino acids and unnatural co-factors into our designs. It's like having our machine now has this kind of totally new powerful thing in it that will allow it to do more sophisticated chemistry. And this is where combining the new deep learning methods, which as you pointed out are really used to just seeing the natural 20 amino acids, with the previous methods that I described where we're modelling everything as just a collection of atoms using physical principles. That combination is powerful because those older physically-based methods have no problem modelling that unnatural amino acid or co-factor as just a collection of atoms connected by bonds.
Chris - It's so exciting all of this and you can really see how this is going to translate and quickly into really groundbreaking stuff, can't you? But when you go home at the end of the day, is your head spinning because of this or do you have crafty ways of managing to relax or sort of get away from it and not think about protein structures for a while?
David - What I should tell you about is a little bit about my work environment. Because this area is so exciting now, there are many, many brilliant, super motivated, energetic people at all career stages from around the world who are coming to my group to explore new areas like breaking down plastic or fixing CO2 and it's really an amazing place. We don't have very much space, so everyone's very close together and everyone's kind of talking and brainstorming about the next frontier. And because it's a big group, there's new exciting results popping up pretty much every day. And so it's this incredibly exhilarating environment. So I have to say at the end of the day, I spend my day just talking to people in my group sort of individually or in groups and it's super fun, but I would say at the end of the day, my head is spinning a little bit. So it's both the potential of the problem and there's so much activity and exciting progress. So I live in Seattle, which is fortunate because I love the mountains. So on the weekends, I try and get up skiing or hiking or climbing, pretty much every weekend. And, so right now in fact it is a little rainy in Seattle, but that means it's snowing in the mountains. So I'm excited to get out and ski this weekend.
Chris - Good for inspiration. I should think as Kary Mullis, who got the Nobel Prize a number of years ago for discovering and coming up with the idea of the PCR reaction to copy DNA, told me he came up with that concept while driving up to his mountain cabin at Montechino. So maybe your trips out into the great back of beyond are very inspirational.
David - Yeah, they certainly helped me preserve sanity, which is very good.
Chris - Well, thanks very much for telling us all about it, David. Congratulations once again on your Nobel Prize and I hope that with the Nobel Prize comes a bigger office because it sounds a bit cramped.
David - Well, so far I would say the Nobel Prize has been remarkably useless in trying to improve our research conditions or resources, but I'm still hopeful.
Related Content
- Previous The best of 2024!
- Next CoD 6, Veilguard, Pokemon TCG
Comments
Add a comment