David Baker: Hijacking screensavers to design proteins

Taking a page from SETI's book...
10 December 2024

Interview with 

David Baker

IPD-SCIENTISTS.jpg

IPD scientists

Share

In this edition of Titans of Science, Chris Smith chats to David Baker, the Nobel Prize winner who used AI to design custom proteins...

Chris - By the time you were obviously doing your studies and getting into these sorts of questions, it had been nearly half a century since Watson and Crick worked out what DNA did, and that it was storing the code that told cells how to make these magical things, proteins that were gonna go on to dominate your life. So what was it at the time that intrigued you? What was the unknown or the unanswered that you wanted to explore further?

David - There was a very important observation made in the 1970s or late 1960s, which was that if you took a protein and you pulled it apart, it would pull back up to the same shape. And that was what really proved that the shape of a protein was determined by its sequence of amino acids. But nobody knew how that worked, how that code going from amino acid sequence to three dimensional structure worked. And no one really understood what the process of this folding up process was. And I became fascinated about that when I joined the University of Washington as a professor many years ago. I think I found it fascinating because it's kind of the simplest case of self-organisation in biology. So if you think about all the non-living things around us they're not really organised. They're kind of all random. But when you look at your pet, a dog or a cat, or your brother or sister, you know, animals and plants are really highly organised. That's how they're different from the rest of the world. And proteins are really the simplest case because proteins are made out of thousands of atoms, and you would expect them just to be completely random jumbles of different shapes. But yet they just have one shape. And so I was really fascinated about it from that point of view. And also, like I said, proteins carry out all the important jobs in living things. And so I thought if we could understand how proteins fold up, we might be able to actually make new proteins at some point, which could then have a huge number of possibilities.

Chris - So we would think, 'well, I want to have a miniature machine at the scale of, of atoms capable of doing job X. And I think it would have to have shape Y to do that, and therefore I would design a bespoke protein that would do that.' That was sort of the end goal you had in mind at the time.

David - Yes, exactly.

Chris - Well what was stopping you doing that?

David - Well, it's very complicated, the process of going from an amino acid sequence to three-dimensional structure or three-dimensional shape, or going backwards from a three-dimensional shape to a sequence of amino acids that will encode it. As I said, proteins have many thousands of atoms. And so the way we started working on this problem was to try and model all of the interactions between all of those atoms. So to think about the process of protein folding up, we sort of develop methods for actually modelling that holding up of the very long protein chain, and guided by all these thousands of interactions between atoms. And we were able to make some progress on that problem. And then we realised we could go backwards and take a brand new shape and go backwards to figure out what combination of atoms and what amino acids would have the property that they would actually fold up to that shape. We were actually able to do that about 20 years ago and design a new amino acid sequence that folded up to a new shape. And that kind of really opened the door to a lot of possibilities because once we could make new shapes completely from scratch, it seemed like we should be able to design new functions. Given any job that you might want a protein to do, we should be able to design a protein to do it.

Chris - Is it tricky to do this sort of thing because all these atoms are different. They're different sizes, different masses, different electrical charges. And that's going to mean that if you just had two atoms to consider A and B and you could quite easily understand how they would probably stick to each other or want to attract or repel each other. Once you've got thousands of them and they're all doing their own thing independently, you've got thousands and thousands of possibilities to consider. Is that why it is a difficult problem to solve?

David - Yes. That's one of the reasons, because there's so many of them and there's so many different possible combinations. The other thing that makes it challenging is we don't know exactly the details of how those atoms or amino acids interact with each other. And so for that reason, there are errors when we do these calculations, which have to be done on a computer because there's so many atoms. And so there's so many different possible places each atom could be, and the interactions between the atoms, we kept improving our description of those interactions on our ability to try out more and more different combinations of atoms. And that's why we were able to make progress in designing more complicated proteins.

Chris - It does sound, though, like something that would be more tractable with heavy duty computing, because you can ask a computer to consider all of those different possibilities, and it may take a while, but wasn't that what we built supercomputers to be able to do loads and loads of calculations and model this sort of thing?

David - That is exactly what we reasoned. So in fact we started a distributed computing project called Rosetta@home where we enlisted people all over the world who had computers at home and who were running screensavers to run a screensaver that would do these calculations. It would both predict protein structures and it would design brand new structures, and we were able to enlist quite a few volunteers. And actually Rosetta@home became equivalent to a medium sized supercomputer. And so that, I think community involvement in our science has always been really important. Rosetta@home then led to, we had a screensaver, so you could see the protein getting designed or folding up on the computer screen, and people would watch the screensaver and they would write to us and say, 'you know, it's really cool, but I think I could do better.' And that led us to develop an interactive game called Foldit, where the participant could not only have their computer fold the protein, but they could get in and actually guide it.

Chris - So this is sort of like the biochemical equivalent of what went on to become the SETI@home concept, wasn't it? People downloading bits of data to their home computer, which could then be crunched on their computer, their electricity bill, very crafty David. And that led to a resolution or an analysis of a piece of data that when brought back together, you had that enormous wealth of computing power by harnessing thousands of computers around the world.

David - Yes, that's exactly right. In fact we didn't have to build up the infrastructure for this because the SETI@home group and developers were incredibly generous and they actually helped us to use their entire platform and infrastructure for connecting many, many, many personal computers together to do these calculations. Except the difference was, rather than processing signals, radio signals, the participants in our project were folding and designing proteins.

Chris - What were you sending to the users and what were they when they were first doing this before you got onto the game that you developed, that you just mentioned, but when you were just showing them data that was being crunched, what were they picking up from your server? What was their computer doing? What was it showing them that some people were then latching onto and saying, I think I know how to improve on this?

David - Well, when we were trying to figure out how a protein folded up, as we said, there are many, many different shapes that any protein could have. So we would send out just the amino acid sequence of the protein, and then the participant's computer would fold it up and it would send us back the folded structure. And what we get back from this is hundreds of thousands of different possibilities for how the protein could fold up, and from which we could identify those that were the most likely correct solution. And the principle we use to evaluate that is similar to expectation when you have a ball rolling on a bumpy surface, that it will eventually end up in its lowest elevation point. Well, similarly, as I said, we work in these calculations where we look at the interactions between all the atoms and we calculate the energy of each protein in that way. And so we select out those proteins those shapes for which the amino acid sequence had the lowest energy.

Chris - And what were the users seeing that enabled them to point out to you that they could probably help you out a bit more?

David - They saw the protein, the computer, trying out many, many different possible shapes and kind of jumping around a lot. And the participants would look at this and see that it looked like it was flailing around and not really going in the right direction a lot of the time. And that's really what led to Foldit.

Chris - And Foldit was them being able to say, well, hang on, let's actually manipulate that bit to there, this looks to me as the right way a protein should fold. But these weren't protein chemists, these people, these are just computer users at home, weren't they?

David - That's right. They were. And so an important part of Rosetta@home and Foldit, critically Foldit because people were going to do it themselves, was a series of pedagogical introductory levels where you would go through puzzles of increasing difficulty that were designed to teach you the principles of biochemistry.

Comments

Add a comment