Giving a Voice to Silent Speech

01 May 2011

Interview with

Michael Wand, Karlsruhe Institute of Technology

Share

Synthesised male voice -    The recogniser interprets my muscle movements therefore I can talk to you by simply mouthing words.  Do you have any questions?

Chris -   Can you now introduce yourself for me, so who you are and tell us what it is you've actually developed that we’ve just been listening to?

Michael -   My name is Michael Wand and I am from the Karlsruhe Institute of Technology which is the largest Research Institution in Germany.  So the system we’ve just been listening to is a system for silent recognition of speech.  What we do, we put little electrodes on our faces.  This technology is called surface electromyography because it catches the myo signals, the muscle signals, from my face.  And from these signals, we can retrace what has been said.

Chris -   So the computer is recording the electrical signals coming off of each of the muscles as you speak and it’s working out, based on the pattern of muscle activity, what you must've said.

Michael -   That's exactly right.  So the signal which one can actually see on the screen is fed into the statistical recogniser which recognises the pattern of the muscular activity and can retrace what was said.

Chris -   So you must've trained this to recognise the pattern of movements you would make when speaking.  So if I took the electrodes off of your face (and they're decorated all around your mouth, picking up all the major muscles I presume) then it wouldn’t recognise what I was saying in the same way as it recognises what you're saying.

Michael -   That's correct.  This system is geared towards me.  If someone else wanted to use it, it would require a few minutes of recordings just to adapt itself.

Chris -   So it’s quite quick then.  You can train it relatively fast.

Michael -   That's true, yes.  The systems we present are based on about 5, 6, 7 minutes of training which actually works quite well for a sort of limited vocabulary.  The more training data I put into the system, the better it gets. Our best system which we didn’t bring today, takes about 45 minutes of training, which is akin to a conventional speech recogniser that you can buy in a shop and that recognises about 2,000 words so it's not – maybe not quite as good as a traditional speech recogniser, but is perfectly suitable for communication without actually being heard.

Chris -   Now it’s interesting because you're from Germany, but you're speaking to me in English.  So you've trained this in English presumably.

Michael -   Yes. We’ve trained this system in English because we are a very international institute and we came here in Washington DC just to present it.  It’s in principle not a problem to adapt such a system to any other language.  It will just mean essentially changing the dictionary of pronunciation, telling the system how German is pronounced and then probably re-training it because German has got different sounds to English and then it works for any other language, French, Spanish, whatever you get.

Chris -   Who do you see using this?  Who is this targeted at or what sort of market would this go into?

Michael -   So right now, it’s still a research project since we’re a university, but there’s a huge body of interest; from one side from those people who have lost their voices.  There is a quite large group of larynx cancer patients out there and they can usually move their mouths quite normally, but there is a lack of sound source because their larynx has essentially been cut away and these people are very, very eager to get their voices back. There's a huge interest for them for using the system, no matter what it looks like, it’s going to look much better in the future of course. But they’d certainly be willing to use such a system. And as our market would really be having a system which is a bit nicer, like a little electrode head set which you can just put on, then I might use the system for instance to augment my cell phone, and then I could just use it when I get a phone call when I'm in a meeting, just to communicate silently.

Chris -   So no more shouting on trains.

Michael -   Exactly, yes.

Chris -   The only problem that I can see at the moment is that it does make you look rather strange.

Michael -   Well currently, we are working on different kinds of electrodes together with our cooperation partners from industry and from science, and they are going to look different in the future. Right now, we are looking into electrode technologies and multi-electrode technologies which might make it possible to use much less electrodes and get a much better signal which will also improve accuracy, and make the system more robust.  What we might use is a system where I put on a kind of headset which looks just like a normal microphone headset, and nonetheless contains electrodes.  I can see this in about 2 to 3 years time.

Comments

Add a comment