Finding one voice in a crowd

How do you pick one signal out a noisy place?
26 March 2019

Interview with 

Sahil Gupta, Soundskrit


Music festival


The market for voice activated products is exploding. Millions of people are embracing this technology. Many of you probably use it yourself: Amazon have their Echo, and rivals Google have their home assistant, and there are numerous other new recent entries to the market. But all of these voice-activated gadgets face a similar challenge: they struggle to pick out the person talking to them in the cacophony of a noisy environment. At the moment, they solve the problem by including a bulky array of microphones that resolve the direction of a sound and allow the device to decipher what’s being said and by whom. But this takes up a lot of space and ultimately constrains the design of the product. Now a company called Soundskrit reckon they have the answer. Taking their inspiration from chirping insects, they’ve been able to shrink a microphone to something smaller than the nail on your little finger. Adam Murphy heard how, from Soundskrit’s co-founder Sahil Gupta...

Sahil - We're working on new type of directional microphone to improve audio capture and speech recognition in noisy environments. You know, we've seen a lot of different voice-based applications emerge that require the ability to sense sound from a distance. And the challenge is, you have a lot of noise in the world so when you're trying to pick up the voice of somebody that standing a few feet in front of you, you also pickup all the background noise. And to combat that people usually have to use very big microphones or microphone arrays and that concept can't really be shrunken down well, so we have a new directional microphone concept where we can kind of create a single chip solution smaller than my fingernail and give you very good directional performance and noise rejection.

Adam - So how actually does it work because it would with very useful here in a very nosy conference centre?

Sahil - Yeah, yeah absolutely! So it's interesting right because the concept was originally inspired by insects actually. We’re working with professor who has been studying insects for the last 30 years, because they have a really small auditory system, and they use the hairs on their body to send sound. Imagine I had a string that I held between my two fingers, if I blow across that string it’s going to move, but if I blow parallel to it, it won't move right. And so it only senses sound, air, coming from a specific direction, and I can take that same string and layer multiple strings in different directions and now I can kind of mechanically filter out the sounds coming from each different direction, and then in software I can combine anything and mix it together or where I can only listen to a specific one depending on the application.

Adam - And how good have you found it to be so far?

Sahil - It works quite well. I mean in this sort of environment you know we can have a couple of people talking into the microphone at the same time and we can transcribe what both of the people are saying. We can localise where you're standing from, where you're moving, where you walking, even with all the noise. It’s still very early proof of concept and we definitely have a lot of R&D before it’s ready for the market. But yeah, right now it's looking good.

Adam - What do you hope the end product looks like?

Sahil - it would be just a single sort of, chip that you could fit into something like your phone or your Apple Airpods, or your smart speakers and smart home devices. That's on the hardware side and there we’re also developing a sort of software layer. So we give you this directional microphone, we also want to provide a sort of toolset where you can have new features you can mess around with to get different types of effects and whatnot.

Adam - What kind of things are you envisioning people do with that?

Sahil - You know there's obviously speech recognition right. So think about the Amazon Echo, it's this big, large clunky speaker, we could give you that type of capability on just a single chip that you could drop into any of your devices. And outside of speech recognition right now with my phone if I pinch my two fingers I can zoom in and out of my video, what if I could do that with my audio as well. Now I can zoom in and out of my audio and even filter out more of the background or take more of it in, or I could tap on my iPhone and wherever I touch  on the screen it will kind of selectively only listen to the sound in that portion of your picture and image. Also spatial audio, right now there's been so many people looking at binaural playback, stereo playback from headphones. Where you can get the content from, right. Now from your smartphone you can actually capture that spatial content and then when you play it back through a headset or speaker you have the content to actually play that back.

Adam - So it's loads of little microphones working together, is that what the strings are basically?

Sahil - Yeah, yeah pretty much. And the nice thing is we use these nanostructures that we can package on a single chip so to the end customer it looks just like a normal single microphone. And then you know if you want to do some really crazy stuff we could use multiple of our microphones and get even better nose rejection, even farther listening ranges, things like that.


Add a comment