Using machine learning to analyse fossils
The advent of machine learning and AI is reaching back 400 million years to re-evaluate our fossil record. In the past, we were resigned to cracking open a fossil and looking at it under a microscope. But since then, scientists have realised that also present in the fossil are some of the original chemicals that were in the entity when it was alive millions of years ago. A technique called ‘infrared spectroscopy’ can use infrared light to see and quantify these substances. So researchers have now gone a step further and developed a system that can use the relative compositions of these chemicals to confirm what the biological sample might have been. The machine learning algorithm had a test run on the Rhynie Chert, an almost perfectly preserved set of fossils from Scotland, and identified them with remarkable accuracy. So what are the advantages of using machine learning , and where next for the project? Will Tingle spoke to the University of Edinburgh’s Corentin Loron…
Corentin - Using machine learning, computer algorithm statistics, all those kind of tools, you are able to go behind what you can see with your own eyes and access very tiny details that you would've missed if you were just qualitatively analyzing your data. I mean, infrared spectroscopy is not new. What is new is to use machine learning with the data from a infrared spectroscopy. But what is very interesting with this particular technique compared to all the other types that could be used is we have only a bare minimum of preparation for the sample. We're going to slice the rock into sections and those are usually made to look at the fossils under the microscope and there are thousands of those in museums. So now we can take those out of museums and just apply the technique directly on those. We're not going to do any new destruction. So in that way, and this is exactly what we've done in this study, we were able to go in to this precious collection from the National Museum Scotland and look at them without destroying all the rocks. And this is a precious collection. You don't want to destroy those to do studies.
Will - Now that you've had a test run with these well preserved, well-documented fossils and you've seen that it works, is the plan to go out and find some more ambiguous samples?
Corentin - Obviously, the idea behind looking at this very famous site was that we can recognise with our eyes what the fossils are. And so when we look at the signature, we can do a positive matching. You have a positive control over your data. So now we know machine learning approaches worked on fossils because we can actually see that it was a match between what the fossil is and what the signature was. So now we can go back in time in fossil assemblages that are 1 billion - 1 and a half billion years old, which contain a lot of fossils, but with very, very simple shapes, very, very simple forms for which we have no idea what they might have been. They might have been algae, they might have been fungi, they might have been protists, some sort of microorganism. And now we have a new type of approach to look at what their affinities, biological affinities, would be.
Will - So this might give us an idea of the biological makeup of billion year old organisms?
Corentin - Exactly, exactly, yes.
Will - What does this method reveal about the molecular preservation of samples? Because with a 400 million year old sample, I'd personally assume there's not much left?
Corentin - So this site is known for its incredible preservation. It didn't undergo a lot of geological transformations. That's why we can see so many fossils with our eyes on a microscope. But what it revealed on the molecular level is that not only the preservation, the morphological preservation, is amazing, but the molecular preservation is also amazing. And for instance, we were able to see very tiny details of what those fossils were composed of, what type of sugars were composed, what types of fossils, for example.
Will - Does this allow us to look at harder to reach samples? If we find a perfectly preserved 500 million year old sample, that's great, but that's also very rare. Could this machine learning help us identify more partial fossil remains?
Corentin - I would say, theoretically, yes. Now it will depend of course on what kind of condition your fossils were preserved. Because if they were preserved in a very, very harsh environment, if you undergo very high temperatures, then maybe you will see that a fossil was there. But the molecular signature will be not very good. But we're lucky because we have a lot of very ancient fossil sites where the fossil are very well preserved. Some of them in the UK, the billion years old Torridon group in the North of Scotland, for instance. So definitely they're a good target for this kind of technique.
Will - Is there potential to shake up what we already know? You might go back to something that everyone has assumed is one thing and your method says otherwise.
Corentin - Exactly. I think this is the whole point of this whole approach, and we do it in a certain way in this paper when we looked at those curious organisms that were assessed to be part of plants, maybe they were part of a fungi, maybe they were bacteria. And with our study, we were able to show that actually they have a molecular composition that is closer to plants than to fungi. So we definitely could go to those cryptic fossils that we have and try to know where they could fit in the tree of life.