Scientists decode the signals behind our thoughts

Pay it no mind...

15 August 2025

Interview with

Erin Kunz, Stanford University

Part of the show Decoding our inner voice, and hunting for life on Mars

BRAIN DECODERS.jpg

Credit:

Jim Gensheimer

Play Download

Scientists at Stanford say they’ve decoded the brain signals behind our inner voice - the thoughts we hear in our heads - with around 74% accuracy. The breakthrough could let people who can’t speak communicate by simply thinking of a chosen word that’s turned into speech. Erin Kunz at Stanford University led the study...

Erin - Working with our participants in the previous studies, in which participants attempt to speak but they have a limited ability to do so due to, for example, ALS or stroke, we wanted to investigate a way that might be more comfortable or easier, so perhaps by imagining speaking instead of actually trying to speak, so bypassing the need for physical effort. The second motivation was that now as these systems are achieving impressive accuracies at decoding pretty much open-ended speech, we wanted to explore the possibility of these systems decoding something, words that the user may have not intended to be said aloud.

Chris - So is this almost like when we talk to ourselves, it's that inner voice you're seeking to pick up and decode and turn into an output?

Erin - Sort of. So we first were explicitly asking our participants to imagine aspects of them saying those words, so actually imagining the movement of their mouth or, for example, imagining the sound of their voice when they try to say the words.

Chris - And how were you recording the activity? Because this is picking up on brain activity, isn't it?

Erin - Yes, so we're recording from the motor cortex, so this is the part of the brain that controls basically our voluntary movement, so when you want to move your hand or when you want to speak, and specifically we're in the speech motor cortex, so focusing in on those areas that control your mouth and your tongue. And we're doing this with devices called micro-electrode arrays. They're about 3.4 millimetres square, so smaller than a pea, and these are placed on the cortex during a surgery. And these have 64 electrodes on them, things that record the electrical signals of individual neurons firing in the brain. We can record those signals in real time for this study.

Chris - There must be a long training phase then, so presumably you instruct your participant to think about this word or this sound, and you must work your way almost like through a dictionary of different signature sounds or speech outputs so that you can work out what the different neurological patterns are that would correspond to each of those.

Erin - Yeah, that's correct. So we cue participants with sentences on a screen, so for example, the sentence, I feel good. So the participant will see that sentence and then they'll imagine saying that sentence, and then we do several of these sentences. So in the study, depending on the participant, they imagined between 80 and 500 of these sentences in order to train the models to decode the patterns of their speech that were associated with individual phonemes, individual speech units, basically. In the English language, we have 39 phonemes, and those are what make up the words. And so then the decoder can take that sequence of phonemes and identify what the most likely words were from them.

Chris - And then what do you do, flip it round? So you say, well, now it's learned what the brain does when we tell it to imagine saying X. Now we flip it round, we look at what it's saying it thinks the person said and ask them, is that what you really wanted to say?

Erin - Yes, that's correct. So once the decoders are trained to identify those patterns, then the participants can imagine saying what they want, and the decoder can decode that.

Chris - And how accurate is it when you then start just looking at the activity that's coming out and then asking the person, is that what you were thinking of? How right is it?

Erin - So it varied across participants, but in the best case scenario, we were able to get an 86% accuracy when decoding from 50 words and a 74% accuracy when decoding from a large vocabulary of 125,000 words, which would essentially be being able to say anything you want.

Chris - Well, when one thinks about how many mistakes we make with fat thumbs trying to type text messages, that's pretty good.

Erin - Yeah, it's pretty good. And also for reference, systems like Siri or Alexa typically get around a 95% accuracy. So that's generally thought of being the sort of transition point between usable and not usable.

Chris - Can it continue to learn though? Because obviously everyone's a bit different, and it must be possible to pick up on foibles of how people think or tune it slightly more with time. So can it continue to learn? So will performance potentially continue to rise in these people, or is it topping out at that roughly 75, 80% accuracy?

Erin - Yes, absolutely. So this was an initial proof of concept. So we have pretty limited amounts of training data. And since this study was completed, we've continued to collect some training data. So we're still exploring the possibility of achieving higher accuracies with this type of device.

Chris - And crucially, in the people with disabilities whom this could be applied to, where existing systems are quite fatiguing to use, trying to blink or look at things or breathe to move cursors around, is this a lot less cognitively taxing for these people? So they find that communication is much more effortless?

Erin - I think some participants report that it is less effortful. Some participants don't mind as much when they're attempting to speak. So I think it's just offering another option, depending on user preferences.

Chris - And what did the end users make of it?

Erin - I think there was excitement about the possibility of this both being less effortful, as well as the potential for it to reach faster communication rates. Even though the systems like this that are built on using attempted speak are quite a bit faster than some of the previous options available, they're still not quite at the speed of typical conversation. I think the fastest study published has reached about 90 words per minute, whereas typical conversational speech is close to 150 words per minute. So this sort of inner speech decoding may be a way to reach those conversational rates of speech. And notably, at least a few of our participants have expressed the enthusiasm about the ability to potentially interrupt a conversation.

Chris - Now, given that what you've effectively got here is a system that can hear a person's thoughts inside their head potentially, that there might be things they say and they didn't actually want the computer to hear it. Is there an ethical angle to this as well? And were any of your participants or anybody in the study uncomfortable about the fact that you're now probing something that previously would be completely private to a person?

Erin - So yes, we wanted to responsibly address this question. And I will point out that we've also looked into comparing the sort of inner speech representation to attempted speech and that there is strong distinguishability between those two. And we've actually proposed two methods in the paper for addressing this in ways that the system can either totally ignore inner speech altogether if someone's using a system based on attempted speech, or else a password that will allow the user to control when the decoder is running. So you can sort of think of that as saying, hey, Siri, or hey, Alexa. And if it doesn't receive that command first, it just ignores.

Chris - So there's like a wake word for the interface. So the person can divorce themselves from having their thoughts read when they want a private moment.

Erin - Yes.