Computers making phone calls
At a conference last week, a demo was played of a phone conversation between someone booking a hair salon appointment for a client, and a hair salon worker recieving the booking. So far so normal, only one of the women was not human. Chris Smith spoke to tech investor Peter Cowley to find out more...
Hello. How can I help you?
Hi. I’m calling to book a woman’s haircut for a client. Um, I’m looking for something on May 3rd.
Sure. Give me one second.
Sure, what time are you looking for around?
We do not have a 12pm available. The closest we have to that is a 1.15.
Do you have anything between 10am and, uh, 12pm?
Depending on what service she would like. What service is she looking for?
Just a woman’s haircut for now.
Chris - Peter, why are Google doing this?
Peter - We’ve heard of Google Assistant which is like Alexa and Siri, which is basically you ask a question and you get a reply. This is turning that further, this is called Google Duplex where Duplex in this situation means bidirectional, so it’s generating dialogue. And the reason they started this, apparently, is because 60 percent of all businesses in the US don’t take online bookings, and they only take it by voice. So, therefore, they’ve demonstrated this to show that it can be done (A) interactively, and (B) with these umms and errs so it sounds like a human being.
I demonstrated this to a group last night at a dinner party and they were absolutely astonished, and there were several people got it wrong, which one was.
Chris - Yes, which one was real, yeah. Why on earth were you doing that a dinner party? Didn’t you have anything decent to talk about?
Peter - Because I was coming to see you today, Chris.
Chris - But, how on earth are they doing this?
Peter - Well, recognition has been around for 20 or 30 years and it’s getting better and better. And many of our listeners will have some sort of device in the home - which is recognising what you’re saying; most of the time it’s getting it correctly.
We’ve had speech synthesis around for years and years and years. This is coupling together. But the important thing it’s putting a level of machine learning or artificial intelligence if you want, which will actually interpret what’s being said by the human and then play something back with a synthesis.
Chris - What’s the processing overhead behind this? Is this an enormous supercomputer required to do this? Or is this potentially scalable, as in do you need a small device to do this so that you could have people having conversations with computers that are meaningful without having to really use the entire Google network to do it?
Peter - A great question. It will end up in our phones at some point. Basically, machine learning requires what’s called a neural net, and a neural nets generally are, fairly slow and very pressure intensive when they’re done in software, but they will end up in hardware. So at the moment yes, it does use the massive set of servers, but in time it will be possible to do it on a much smaller device.
Chris - Now, down to the point of actually why they are doing this, what’s their perceived need? Why do they think that people are going to buy into this and why do they think people this kind of umm and err and pretending to be human? Because, actually that’s a bit deceitful. That makes me uneasy.
Peter - Yeah. Let’s answer the second part first. Communication is more natural if you get the umms and errs and hesitations etc. So then it becomes much more acceptable that it is bidirectional with a human being and, actually, the end result will be better. The human that’s receiving it, in that case the hair salon, is unlikely to recognise that it wasn’t a human at the other end. It hasn’t got impatient or anything or hasn’t tried it out so it will actually speed up that sort of communication. It also, for Google, is doing a demonstration of where the technology is going because it will become bidirectional, Duplex will become a dialogue.
Chris - There was a gentleman we had on this programme a few years back who actually showed that people tend to put in errs and umms ahead of an item in a sentence that they were seeking to emphasise. And he noticed this particularly in parents teaching children language because if you say oh, look at the umm plane in the sky. Actually what it does is it creates a bit of cognitive space, but it also cues the individual oh look, potentially new word or important word coming, you must attend to this. So is Google adding the errs and umms in the right place and does it have the capacity to learn to do that?
Peter - Again, another great question. I think it comes back down to it’s this communication that perfect communication without the umms and errs sounds stultified. I mean, you run podcasts, I run podcasts, we actually take our umms and errs out. I don’t know if you do, after you’ve put this on the Naked Scientists?
Chris - We don’t sterilize the conversation completely for the simple reason that it doesn’t sound like natural speech if you do that. And for the reason that I learned from reading this gentleman’s paper in Science that actually I might be, by sterilizing the errs and umms, I might actually be taking the emphasis away from key components of the sentence, so we want it to sound natural. And of course, all our guests here on the programme are so good at speaking that they don’t say err and umm very much anyway.
Peter - Not me, umm.
Chris - No, not very much. But is there not a risk though that umm people might be deceived by a system like this, or feel deceived?
Peter - Yes, I don’t know if you noticed, in the press last week there was a guy that was doing robocalls in the States; these are robotic calls. And these are simple ones and he was fined almost 90 million pounds for doing this, for spamming people, effectively. But, if you took it further on, you have got this ethical issue that you could confuse. Even solicitors are being conned effectively with the right communication to hand over large sums of money.
Chris - So in future we might see these sorts of calls opening with: by the way, you’re talking to a non-human assistant?
Peter - Google have already said that. They’ve already said that even between last week and this week they’re going to. In some states in the States and other parts of the world they’ve have to say because it’s a recording. And secondly they’re going to introduce :this is Google Assistant or something.