Coding and chatbot AI trump Turing test

But scientists question how failsafe the software can really be...
09 December 2022

Interview with 

Michael Wooldridge, University of Oxford


A brain surrounded by vivid colours.


It turns out that it's been an exciting week for the field of AI - artificial intelligence; on one front, the team at Google's subsidiary "Deep Mind" have published a paper demonstrating a system that can write its own code. In essence, this is a step towards computers that can programme themselves and it looks very impressive: you set it a programming challenge using plain English - for instance to develop a segment of code that can manipulate a series of numbers of letters in a certain way - and it does a better job than half of human programmers that you ask to solve the same challenge. I spoke with Oxford University's Mike Wooldridge, who's a computer scientist and author who publishes on AI; we began though by discussing another AI breakthrough that's been making waves this week which is ChatGPT, an artificial intelligence chatbot. As far as Mike's concerned, this passes the revered "Turing Test", named after computer scientist Alan Turing, which is a measure of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human...

Mike - There's been a huge buzz on Twitter about the latest AI development, and it's a new system from a company called OpenAI. Slightly ironically named; they're not 'open' at all. They're funded by Microsoft and they're a for-profit company. But they've released a series of tools over the last couple of years which do tasks connected with what we call natural language, which just means ordinary language that people use, languages like English, the language that we're using now. Tools that can communicate in natural language have long been a goal of artificial intelligence and they've been very difficult. This is what Alan Turing was talking about in the 1950s when he introduced the Turing test. Well, the bottom line is, the Turing test is now passed. These new tools can generate text which is easily as good as a very good speaker or writer of English, but it turns out they can also do some other impressive tasks as well. One of the standard things I do when I demonstrate these tools is go to the BBC website, copy a news story, then ask GPT to summarise it and it will produce a startlingly good summary. And I can then say, "what are the three key bullet points about this story?" And it will again do a remarkably good job of producing the three key bullet points. You can then ask it to translate it into any number of other languages, and it will again do an equally good job of that.

Chris - When I talk to researchers that are increasingly employing AI in their practice, I ask them, "can you explain how it's helping to do what it's doing? Is it an explainable AI? Can it tell you how it's doing it?" And they just look at me and they say, "no."

Mike - No. And these systems are enormous. It turns out that to make it work, you need massive what are called neural networks. You need AI supercomputers running for months in order to build these systems. And what comes out is capable of some seemingly remarkable feats. But do we understand how it's doing it exactly? No, we really don't. There are some caveats about this and one of them is when to trust it. They can be very plausible in what they tell you, but sometimes they can be plausible but completely wrong. And if you are gullible, this can set you off on a very bad track indeed. So no, we don't understand exactly how they're doing what they're doing, and this is one of the big challenges with the technology at the moment.

Chris - Well, that was going to be my follow up point, which is that if we don't know how it works, how do we make sure that what it's generating is reliable and is authentic and it's not got some kind of glitch which means it then summarises that BBC news article but accidentally injects the wrong interpretation. And when someone reads that top level summary, they're given totally the wrong impression. It's like a newspaper printing a misleading headline.

Mike - God forbid that newspapers would ever print misleading headlines. That's absolutely one of the challenges. There's a huge amount of work right now to try to understand exactly where they can be trusted and where they can't in the short term. Where they're most likely to be used is in low risk scenarios, you know, where these are not life or death situations, where it's not somebody's job that hangs in the balance on the output of these things. But there's a lot of work yet to be done on exactly understanding where they are reliable and where they aren't.

Chris - Can I point you at a paper that's come out this week, it's in the journal Science and it's presumably founded on the same sort of technology or the same system where what they're doing is now saying, "we've got a system that can write computer code and it does it on average just slightly better than a human can." But in the same way that we would give a human instructions, I want to write a computer programme that does X, Y, and Z. It seems to be able to take those human instructions and turn it into reasonable computer code solving the problem most of the time. It seems pretty impressive to me that you can do this.

Mike - You are right. The technology is exactly the same. The way that they work - they do what's called a glorified auto complete feature. What it's looked at is all the computer programmes that are available on the worldwide web and there are a huge number of those. And what this particular system does is, you type what you want the programme to do, it'll generate a large number of possible candidate programmes, and then it'll whittle those down by running some tests to see which ones look like they're producing the likeliest answers. It's very neat. I think it's a lovely result. I absolutely would not trust a computer programme that came out of that process, at least not in any mission critical or life critical situation.

Chris - The difference of course in the two situations is that if it generates computer code, because you give it instructions, "this is what I want to achieve", it will come up with some code for you. You can then interrogate the code it's come up with and you can see how it's working. So there is a possibility of making it explainable in terms of its output when it's generating a tractable thing like computer code we can understand, isn't it?

Mike - Yeah. Although computer programmes are notoriously difficult to understand. So there's going to be a trade off between the extent to which it's worth you just writing it yourself and you know that you understand it versus the amount of time that you've got to take to convince yourself that it works. But I say what you absolutely shouldn't do is just trust it out of the box. That would be extremely naive and I think that's one of the big worries about this application. That doesn't mean this isn't a very neat result from DeepMind. I think it is. They've demonstrated that they can get, in programming competitions where people are asked to produce programmes to a certain specification in a certain amount of time, they've got very creditable performance on that task. So that's a nice result. But computer programmers I think can sleep easy in their beds. I don't think they're about to be replaced.


Add a comment