Geoff Hinton: Why does AI get things wrong?

Hallucination or confabulation...

25 June 2024

Interview with

Geoff Hinton

Part of the show Titans of Science: Geoff Hinton

COMPUTER_NETWORK

Credit:

CC0, via Pixabay

Play Download

This episode of Titans of Science, Chris Smith spoke to AI pioneer Geoff Hinton about how AI works, and what we should be wary of...

Chris - Can those individual layers tell us what they 'think' though? Because one of the problems that researchers when I go and talk to them say to me is that they would very much like to know how when they build these sorts of systems, it's arriving at its conclusion. It's so-called explainable. So when it sees a picture of cancer having been trained to recognise cancers, it can explain what particular features of the picture it saw singled out those cells as cancerous. And some models do this but others don't. Now is the way that they do it by those things, being able to tell you what they changed in order to make the output that they got.

Geoff - It's not so much to tell you what they changed, but tell you how they work. So for example, if you take the layer of neurons that receives input from the pixels, let's suppose we're trying to tell the difference between a two and a three. You might discover that one of those neurons in that layer is looking for a row of bright pixels that's horizontal, near the bottom of the image with a row of dark pixels underneath it and a row of dark pixels above it. And it does that by having big positive connection strengths to the row of pixels that it wants to be bright and big. Negative connection strengths to the row of pixel cells. It wants to be dark. And if you wind it up like that, or rather if it had learned to wire itself up like that, then it would be very good at detecting a horizontal line. That feature might be a very good way to tell the difference between a two and a three because twos tend to have a horizontal line at the bottom and threes don't. So that's fine for the first hidden layer, the first layer of feature detectors. But once you start getting deeper in the network, it's very, very hard to figure out how it's actually working. And there's a lot of research on this, but in my opinion, it's going to be very, very difficult to ever give a realistic explanation of why one of these deep networks with lots of layers makes the decisions it makes

Chris - Is the explanation you've given me for how this works pretty generic. So if I took any of these models, they're probably working in a similar sort of way. And if so, when someone says 'I'm working on AI', given that we have that sort of platform, what are they actually working on? How are we trying to change, improve, or develop AI away from that main principle, that core fundamental operating algorithm that you've described for us?

Geoff - On the whole, we're not trying to develop it away from that algorithm. Some people do, but on the whole, what we're trying to do is design architectures that can use that algorithm to work very well. So let me give you an example in natural language understanding. In about 2014, neural networks suddenly became quite good at translating from one language to another. So you'd give them as inputs a string of English words and you'd want them as outputs to produce a string of French words. In particular, given some string of English words. You'd like them to produce the first French word in the sentence, and then given a string of English words plus the first French word in the sentence, you'd like them to produce the second French word in the sentence. So they're always trying to predict the next word. And you train them up on lots of pairs of English and French sentences. And to begin with in 2014, when you're trying to figure out the next word, you'd have influences from all the previous words. And then people discovered a bit later on that rather than letting all the previous words influence you equally, what you should do is look at previous words that are quite similar to you and let them influence you more. And so you are not trying to get rid of the basic algorithm or circumvent it, you're trying to figure out how to supplement it by wiring in certain things like attention that make it work better.

Chris - When we hear that one problem with the large language models that we are seeing manifest very much now is that they can hallucinate, where does that behaviour come from? How do they generate these spurious things that don't exist, but they're said with enormous authority in the outputs from these sorts of engines? Where does that come from?

Geoff - So first, let me make a correction. It ought to be called confabulation, not hallucination. When you do it with language, it's called confabulation. And this was studied a lot by people in the 1930s. And the first thing to realise is that this makes them more like people, not less like people. So if you take a person and you ask them to remember something that happened quite a long time ago, they will, with great confidence, tell you a lot of details that are just wrong. And that's very typical of human memory. That's not exceptional at all. That's how human memory is. And that's why if you're ever on a jury, you should be very suspicious. When people remember things, they often remember things wrong. So the big chatbots are just like people in that respect. And the reason they're like that, and the reason people are like that, is you don't actually store things literally. We used to have a computer memory where you could take, for example, a string of words, then you could store it in the computer memory, and later you could go and retrieve that string of words and you get exactly the right string of words. That's not what happens in these big chatbots. What the big chatbots do is they look at strings of words and they're trying to change the weights in the network so that they can predict the next word. And all of the knowledge they have of all the strings of words they've seen is in the weights on those connections. And when you get them to recall something, what they're really doing is regenerating it just as you are with people. And so they're always constructing these memories and there's actually no difference between a real memory and a fake memory except the one's right from the point of view of the person constructing it. You don't know which is real and which is fake. You just say what seems plausible to you. And the chatbots do the same. Now the chatbots are worse than people at confabulating, but they're getting better.