Developing synthesised speech
Due to his motor neurone disease, Stephen Hawking already had limited powers of speech. But, in 1985, when he contracted a near-lethal dose of pneumonia, a tracheostomy to save his life simultaneously robbed him of his ability to speak. Yet his ability to communicate his science is undoubtedly one of his most outstanding characteristics. And this was made possible by the speech synthesiser system that ultimately became one of his most powerful trademarks. Chris Smith spoke with Lama Nachman, Director of Anticipatory Computing at Intel, who developed the assistive computer system that enabled Hawking to interact with the world...
Chris - How did Stephen Hawking use his first speech technology; how did it work?
Lama - Initially, after he lost his voice he was using spelling cards for a while where he would just essentially use his eyebrow to indicate yes and no. And then there was an early software that was done by a company called Word Plus, and the software was called Equaliser. Basically it allowed him, with a joystick, to select letters and complete words so that he can actually speak. With that, there was another piece of software which is called Speech Plus. And that took the output from that first system and essentially spoke that out through that analogue speech synthesiser. To date, this is the speech synthesiser that he’s been using all along. The reason for that is that he really associated that to be his voice. So, all along, with all the technologies and all the improvements he continued to use that one piece of software for the actual speech synthesis.
Chris - How did he control the system in the first place, and how did that have to evolve as his condition evolved?
Lama - Initially, he was able to use his hand and control a joystick. However, in 2008 he couldn’t do that any more because his hands were not strong enough to do that. So his technical assistant at the time managed to cobble some different, off-the-shelf components and build this sensor system that he attached to his glasses. It’s essentially an infrared sensor so when you move, the sensor will detect that movement. It’s similar to what you would have today in phones, for example, when you bring your phone close to your ears it will detect that there is something close to it.
Chris - Is this an eye movement? It was looking for him moving an eye or looking in a certain direction or was it just detecting facial movement?
Lama - It was literally his cheek movement. As he moved his cheek up, the sensor detected that movement and sent that signal which was equivalent, essentially, to the pusing of a button, he just pushed the button with his cheek.
Chris - And what, the system is then presenting possible things he might want to say and he’s selecting and slowly honing down on a list of things to build words, sentences, phrases, and so on?
Lama - Exactly. Imagine a keyboard, for example, where letters continue to get highlighted, and when the letter of interest is highlighted, he will move his cheek and it will select that letter and put it in. There is word prediction so he could select the word that he wants to select if that shows up as he starts to type these letters.
As he moved from that original system, from Word Plus, to something called Easy Keys where it allowed him to control his whole Windows interface. Now imagine beyond just clicking a button you can, in the same way, emulate something like a mouse movement. You will scan the whole screen and as you come closer to the row of interest he will click. And then it will start going through the columns, and as it gets to the column of interest he will click again so he could any point on his screen so, essentially, that allowed him to control his windows machine.
Chris - You, at Intel, got involved about five years ago; so what was the step-change that you brought to the party?
Lama - In 2011, he reached out to Intel and, basically, his issue was that he couldn’t really control his interface well any more, and part of that was because it was very hard for him to move his cheek very reliably and trigger that sensor. And part of it was that the whole system was old enough that, as he had a hard time controlling the sensor, everything else, essentially, ended up being too slow.
We went out there and first tried to understand what was needed to change, and through observing him for months, and months, and months we thought "there are so many different things that we could do". There is gaze tracking, there’s brain computer interface, all of these revolutionary methods that could help him. As we continued to look through these things, he continued to reject all of those and some of those didn’t work for him because the gaze tracking couldn’t lock on his gaze. The brain computer interface couldn’t get signals from his brain and he joked that maybe he doesn’t have any brain signals to be measured!
Through all of this, we realised that he wasn’t really looking for something that was revolutionary, he really wanted something that was familiar. Then we started to go back to the drawing board and try to understand how would we design the system that’s more efficient but still looked and felt the same. I that that was pretty much the hardest part of what we had to really do.
Chris - You obviously triumphed because you kept him communicating?
Lama - Yes. It took a couple of years and a lot of failures but then after that I think we finally figured out how to keep the look and feel the same, but automate a lot of stuff that was being done by using the mouse, and being done inefficiently. If you imagine, for example, somebody wants to open a file or something like that, it doesn’t really make sense to think of this as a whole series of mouse clicks that take forever. Instead you want to just automate that whole process under the hood and just give him a few options that he can select from. And that, essentially, was the “ah ha” moment where we just looked through every single thing that he did with his machine and, over time, build a lot of automation under the hood to actually make that faster.
The other part of it that was typing and communicating, so to speak to others, that part word prediction brought in a huge improvement. We were trying really just to reduce the number of clicks that he needs to make before the thing will predict what he’s trying to do, and do that in a way that was cognisant of what he was trying to communicate. Because, when he’s trying to talk to people, it’s very different than if he wants to type a document and do his research or search the web.
Chris - Wonderful work that you did. What was it like working with Stephen Hawking, was he a good customer?
Lama - He was phenomenal. It was probably the most amazing thing to watch. I mean was a force of nature. He just persevered; he kept working and working on improving that system and giving us all the feedback that we needed to make that into something that he could use.