Neil S. Briscoe asked:
I was on a video call today. That was bad enough. What was worse, however, is that when speaking to the people at the far end, I could continually hear my own voice echoed at their end, This meant I would start speaking, hear myself, and then not be able to easily continue the conversation because I found the delayed sound of my own voice confusing.
It seems the human brain doesn't like even a slight delay like that.
I am wondering why this might be.
Chris - I think we’re very used to the fact that when we speak, you get sound coming at you from two different sources. One is the vibrations coming out of your mouth through the air and into your ears, the other is that when you speak, or sing, or make noises, the vibrations go into your bones, and then into your inner ear via that route.
So you have these two sources of sound coming at you and I think we learn to control our speech patterns and our speech loudness, and the cadence of our speech (we make ourselves sound interesting) by listening to ourselves in real time. You get used to that latency or the delay between you making the sound and then the stimulus coming back at you.
Obviously, when it’s coming through those two routes out of your mouth and into your ears again, the latency is incredibly short. So the feedback route is optimised to work that way. When you start using electronic equipment and you try to apply the same latency, of course it’s a much bigger delay so your brain gets confused because it’s trying to feedback and control what you're saying, but it’s listening out for information that it is receiving much later than it thought it would. This confuses the brain, gets its interest going, and it says, “Right, I'm now listening for the sound. It’s not there… Oh, there it is!” And that delay confuses you and you get sidetracked. It’s that sidetracking that we find distracting until you learn to suppress it or ignore it.
I think that's basically what's going on. I think if you spent your life living on video conferences, you probably find it a lot easier to cope with, but I wouldn’t advise it.
I always thought video conferences would work a lot better if they were sound only and somebody held up a cardboard mounted photo of whoever happened to be talking at the other end. I suppose with a little ingenuity the lips could even be made to move. The money we whizzed away on videoconferencing rooms and crap that never really worked was just ridiculous.
Isn't it a neurological version of this latency which causes stammering : the person receives a slightly delayed version of the sound they have just made so they get caught in a loop trying to form a syllable.