Choir volume: could more singers mean quieter?

David Cooper · « **on:** 04/04/2013 20:11:30 »

If one person sings, (she) sings at a certain volume. If two people sing the same notes at the same volume, the sound waves could add together or cancel out, so at times it might sound louder and at other times quieter. With a thousand singers though, the sounds should for the most part cancel out.

If the voices of a thousand singers all produced sound waves which added together, they'd be a thousand times louder than a single singer, but that clearly never (or almost never) happens, so there is clearly a lot of cancelling out going on. Of course, there's rarely going to be a point where all the singers cancel out completely either as there will at any moment be a random excess of them pushing the sound waves one way more than the other, so that means a choir should actually continue to get a louder as the number of singers goes up (because the size of the random excess will get larger with it), but how much does the volume actually go up on average in proportion to the number of singers? Is there a formula for this?

evan_au · « **Reply #1 on:** 04/04/2013 20:58:33 »

When you add an extra singer, the total sound power increases.

It is true that at some instants, in some listening positions, at some frequencies, the sound from the second singer will cancel out the sound from the first singer
...but this will be overridden by the fact that at most frequencies, in most listening locations, most of the time, the sound power will be higher.
To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators, not human singers.

The effect you notice is that in many areas of human perception, the response to stimulus is logarithmic, not linear.

This is why the Decibel scale of sound perception is calculated as 10*LOG(P₁/P₂).
Adding a second singer increases the sound level by 3dB, which is just perceptible to the human ear under normal conditions
Adding 9 singers increases the sound level by 10dB, which is audibly much louder
Adding 99 singers increases the sound level by 20dB
Adding 999 singers increases the sound level by 30dB
There is the same perceived increase in volume in going from 1 to 10 singers as there is going from 100 to 1000.

See: http://en.wikipedia.org/wiki/Sound_pressure#Sound_pressure_level

RD · « **Reply #2 on:** 05/04/2013 00:45:35 »

Quote from: evan_au on 04/04/2013 20:58:33

To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators ...

and the the sound waves must be in anti-phase ... http://en.wikipedia.org/wiki/Phase_%28waves%29#Phase_difference

David Cooper · « **Reply #3 on:** 05/04/2013 21:51:39 »

Quote from: evan_au on 04/04/2013 20:58:33

When you add an extra singer, the total sound power increases.
It is true that at some instants, in some listening positions, at some frequencies, the sound from the second singer will cancel out the sound from the first singer
...but this will be overridden by the fact that at most frequencies, in most listening locations, most of the time, the sound power will be higher.

The sound power must also be higher when you generate more sound to cancel sound, and yet the volume somehow goes down. The difficulty I'm having with this is that with a large number of singers you should get substantial amounts of the same kind of cancellation because the allignments will become better and better as the numbers go up.

Quote

To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators, not human singers.

It shouldn't matter if the voices are wobbly - if there are enough of them, the precision should improve because every wobble will end up being cancelled out by an opposite wobble (ignoring any random excesses that take things more in one direction than the other).

Quote

The effect you notice is that in many areas of human perception, the response to stimulus is logarithmic, not linear.

I thought scales of this kind were only logarithmic in order to cover a huge range without the values becoming unmanageable and that any logs in formulae are there for conversion purposes to fit in with the distorted scale.

Quote

This is why the Decibel scale of sound perception is calculated as 10*LOG(P₁/P₂).
Adding a second singer increases the sound level by 3dB, which is just perceptible to the human ear under normal conditions
Adding 9 singers increases the sound level by 10dB, which is audibly much louder
Adding 99 singers increases the sound level by 20dB
Adding 999 singers increases the sound level by 30dB
There is the same perceived increase in volume in going from 1 to 10 singers as there is going from 100 to 1000.

That appears to allow no room for any cancellation to take place at all, and yet with a large choir there should be total cancellation as every wave will be cancelled out by an opposite one. What am I missing?

evan_au · « **Reply #4 on:** 05/04/2013 23:10:06 »

Interference from 2 point sources is illustrated in the diagram showing interference fringes at the bottom of this webpage:

Note that:

This assumes the "ideal" case of 2 point sources, of identical and constant frequency, intensity, and phase
Cancellation is only complete at particular points where the listener hears precisely equal intensity from the 2 sources, and 180 degrees out of phase. At all other points, there is some sound power present.
At an equal number of points, the sound power is doubled through constructive interference, where the listener hears precisely equal intensity from the 2 sources, and 0 degrees phase (in-phase).
The average effect, over the entire listening space, is that the sound power is doubled.
The human voice is not a pure, constant frequency of fixed phase - it consists of a number of higher-frequency resonances (or formants) which occur at different frequencies for different people because of the shape of their throat (even if they are singing the same fundamental frequency). Different frequencies will not cancel each other at all - they add in power.
If the two singers do happen to produce the same higher frequency, it will cancel at different positions in the listening space than the fundamental frequency, so those who would hear nothing from a pair of sine-wave generators would hear something from a pair of human singers.
Cancellation occurs only if the two sources have equal amplitude. However, the vocal resonances are excited in bursts by the opening and closing of the vocal folds or "adams apple", which means that the amplitude of sound from the two singers is not constant, and so any cancellation will be short-lived. These bursts occur asynchronously in different singers, and so the sound power adds.
Each of these resonances is not a single frequency, but consists of a variety of nearby frequencies, and so these do not cancel out between different singers
A human singer cannot keep the phase of their voice constant, so the audience position of the cancellations and additions will change dynamically - the average effect is that the sound power will be doubled
The human head does not have a single point receiver - it is a distributed receiver. If one ear hears a cancellation, the other ear almost certainly will not. The overall effect of more singers is more received power. (Modern mobile phones and WiFi devices use the same technique.)
When you add multiple point sources (eg 3 or more singers), the probability of getting a total cancellation drops, since there are very few points in which the amplitude and phase cancel for all 3 singers, at all of their formants.

So, sound cancellation does occur temporarily at particular points in the audience, but it is more than overcome by the increase in transmitted (and received) sound power at multiple frequencies and phases at every point in the audience as you add more singers.

David Cooper · « **Reply #5 on:** 06/04/2013 22:10:29 »

Quote from: evan_au on 05/04/2013 23:10:06

At an equal number of points, the sound power is doubled through constructive interference, where the listener hears precisely equal intensity from the 2 sources, and 0 degrees phase (in-phase).

The average effect, over the entire listening space, is that the sound power is doubled.

How can it be doubled on average when it's only doubled in some places and cancelled out entirely in others (while being somewhere in between at the rest)? On average, it looks as if it should be the same as with one sound source.

Quote

The human voice is not a pure, constant frequency of fixed phase - it consists of a number of higher-frequency resonances (or formants) which occur at different frequencies for different people because of the shape of their throat (even if they are singing the same fundamental frequency). Different frequencies will not cancel each other at all - they add in power.

The larger the number of singers, the less relevant those differences become: most sounds should be cancelled out by other sounds and the volume should drop.

Quote

If the two singers do happen to produce the same higher frequency, it will cancel at different positions in the listening space than the fundamental frequency, so those who would hear nothing from a pair of sine-wave generators would hear something from a pair of human singers.

But again in a large group there would be other sounds which cancel the other frequencies from them as well.

Quote

Cancellation occurs only if the two sources have equal amplitude. However, the vocal resonances are excited in bursts by the opening and closing of the vocal folds or "adams apple", which means that the amplitude of sound from the two singers is not constant, and so any cancellation will be short-lived. These bursts occur asynchronously in different singers, and so the sound power adds.

Again this is something that would be evened out in a large group and lead to a reduction in volume. The same applies to many of your later points, so I won't keep repeating it.

Quote

When you add multiple point sources (eg 3 or more singers), the probability of getting a total cancellation drops, since there are very few points in which the amplitude and phase cancel for all 3 singers, at all of their formants.

I wouldn't expect a total cancellation, but it still looks as if there should be enough cancellation to reduce the volume as the numbers go up, assuming an even distribution of emitted sounds, but then you are never going to get an even distribution - there will be random weightings which fluctuate, and those excesses will by themselves always lead to larger groups being louder as the numbers go up. The point of my original post was that this might be the real mechanism behind it and that it may not have been looked at properly by science. A soloist can still be heard over a choir of a hundred voices all singing at perhaps half power - they are not fifty times louder.

evan_au · « **Reply #6 on:** 07/04/2013 00:29:45 »

Quote

distorted scale

The decibel scale actually reflects the behaviour of human hearing:

The ear is continually bombarded by the random impacts of air molecules, moving at just about the speed of sound
And yet (under laboratory conditions) it can pick out the periodic motion of air molecules, moving with an amplitude of the width of a hydrogen atom
Painfully loud sounds are about 100dB louder, a range of power of about 10¹⁰
The human auditory system compresses this huge range into a more manageable scale
...and it's not just for Sound: Psychological studies of human response to wealth (eg gold), food and pain show a similar logarithmic response, spanning many orders of magnitude in stimulus.

So the scale is not distorted - it is actually a reasonable reflection of human perception of many kinds of stimuli.

evan_au · « **Reply #7 on:** 07/04/2013 08:16:39 »

A friend who sings with a choir joked today that if you have more than 1 singer, all of the singers are singing quietly so they can hear and follow the other singers...

Quote

A soloist can still be heard over a choir of a hundred voices all singing at perhaps half power - they are not fifty times louder.

These days, if you have a nominated soloist, they usually have their own microphone, which gets turned up louder than all the other people in the choir, so you can still hear them clearly.

David Cooper · « **Reply #8 on:** 07/04/2013 20:34:20 »

Quote from: evan_au on 07/04/2013 00:29:45

And yet (under laboratory conditions) it can pick out the periodic motion of air molecules, moving with an amplitude of the width of a hydrogen atom.

That's quite a thought. I suppose the cochlea amplifies it by narrowing, generating an effect like the Severn Bore to make it detectable.

Quote

So the scale is not distorted - it is actually a reasonable reflection of human perception of many kinds of stimuli.

It's clearly distorted, designed as it is to fit in with our use of base ten. It also muddies the waters when you're trying to understand things in terms of doublings of volume.

Anyway, I've been adding waves together in a computer by taking identical waves and offsetting them with an even distribution, the result being that they appear to cancel each other out, much as should happen with most of the sound made by a large number of singers, though I can't work out where the energy goes to as it can't be cancelled.

I've also been looking at the component sounds of vowels in a wave editor (which I have just started writing), and they do appear, as you would expect, to be combinations of a handful of near-perfect sine waves. The vowel "ee" looks particularly simple, probably just being two sine waves with a quiet high-pitched harmonic imposed on a deeper note. I'm writing more software now to check this by breaking the sound down into the individual waves that make up the more complex ones (though I'm doing this work primarily as an attempt to add speech recognition to my own operating system).

evan_au · « **Reply #9 on:** 07/04/2013 21:01:54 »

When you measure the amplitude of a sound (eg the vowel "ee") with a microphone feeding into a computer, you are actually measuring the Sound Pressure Level.

The microphone is insensitive to the constant air pressure, and only measures the variation from the average. The average variation/value of a sine wave is zero. So if you average many sine waves, the average will still be zero.

However, the power & energy of the sound is proportional to the (SPL)². The average of (Sin)² is not zero, and if you add up the energy of many sine waves, the average is also not zero.

This is why the calculation of sound power in decibels has two formulae

One is related to Power (or the Energy you get if you accumulate the Power over time): 10*LOG(P₁/P₂)
See http://en.wikipedia.org/wiki/Decibel#Power_quantities
The other is related to SPL (or the Voltage you get out of your microphone): 20*LOG(V₁/V₂)
See http://en.wikipedia.org/wiki/Decibel#Field_quantities
The difference in the 10 or 20 factor accounts for the squaring of the SPL (or voltage) to produce Power or Energy

evan_au · « **Reply #10 on:** 07/04/2013 21:25:45 »

The usual way of doing speech recognition starts by breaking the speech into short segments (eg 5-10ms) and performing a Fast Fourier Transform to identify the main frequencies in this segment. For vowel sounds, there will be several formants which can be used to distinguish the different vowels; for unvoiced sounds like "sh", "s" and "f", there is a white-noise type spectrum.

But the characteristics of speech change rapidly, especially around plosives like "p" and "b", so the changes in successive segments must be tracked.

These sequences must then be mapped onto candidate words and sentences, taking into account that different people have different pitches to their voice, different regional accents, and different speeds and intonation.
See http://en.wikipedia.org/wiki/Speech_recognition#Algorithms

These algorithms are not simple, so why not start from some freeware or open-source speech recognition software, eg:
http://savedelete.com/7-best-free-speech-recognition-software.html

David Cooper · « **Reply #11 on:** 08/04/2013 00:37:40 »

Quote from: evan_au on 07/04/2013 21:01:54

When you measure the amplitude of a sound (eg the vowel "ee") with a microphone feeding into a computer, you are actually measuring the Sound Pressure Level.

The microphone is insensitive to the constant air pressure, and only measures the variation from the average. The average variation/value of a sine wave is zero. So if you average many sine waves, the average will still be zero.

I see, so you could somehow have lots of sound and anti-sound where nothing is heard, but the pressure is constantly higher while it's going on. I'm trying and still failing at the moment to imagine this working with two sine waves where they cancel each other out: one is sending out a "wave" of increased pressure (the "wave" being the part of a wave where air is being pushed away) while the other is sending out a "wave" of reduced pressure, and the two should cancel out without any overall increase in pressure. There must be some serious flaw somewhere in my way of looking at this.

Quote from: evan_au on 07/04/2013 21:25:45

The usual way of doing speech recognition starts by breaking the speech into short segments (eg 5-10ms) and performing a Fast Fourier Transform to identify the main frequencies in this segment. For vowel sounds, there will be several formants which can be used to distinguish the different vowels; for unvoiced sounds like "sh", "s" and "f", there is a white-noise type spectrum.

Yes, I discovered many years ago that you could easily distinguish between all speech sounds by looking at a display of highly degraded input - I did some experiments on a ZX Spectrum+3 (bought 15 years ago for £2 in a jumble sale - first machine I ever programmed on). By speaking into it via the tape interface I was able to record "sound" and play it back through the TV speaker where speech was only just intelligible. When the microphone/speaker's moving part (technical term for this component not known to me) was to one side it was recorded/sent as a 1, and when to the other side it was a 0, so the only indication of how far away from the centre it ever moved was how long it took to switch from 0s to 1s and back. This is like working with .wav files where there is only a middle and top position displayed (and stored). Even with so little signal to work with though, I could read the speech sounds just from looking at the clumps of zeros and ones on the screen, distinguishing patterns in the white noise of fricatives/sibilants as easily as with the vowels.

Quote

But the characteristics of speech change rapidly, especially around plosives like "p" and "b", so the changes in successive segments must be tracked.

With plosives you get momentary bursts of fricatives which are too short to notice when listening but which are still picked up by the ear and which provide the distinctive character of the plosives: "t" gives you a little burst of "s", for example.

Quote

These algorithms are not simple, so why not start from some freeware or open-source speech recognition software, eg:
http://savedelete.com/7-best-free-speech-recognition-software.html

It would be no fun just getting it all from someone else, but I'm also sure it can be done a lot better, and starting out by studying how its normally done isn't likely to lead to finding better ways to attack the problem. Also, however efficient the Fourier Transform may be, surely it requires you to turn the wave into an equation to start with? That looks like a complex task which is likely to cost as much time as it saves.

When thinking about the cochlea a while back, it struck me that the hairs are probably tuned to resonate at particular frequencies (one frequency per hair) to detect whether those frequencies are present, thereby saving the brain from doing a lot of complex maths. There's an obvious way to do the equivalent of that using very simple code which I am going to try out: pick a frequency, count +ve displacement for half the time, then count -ve displacement for the other half and spread this over three wavelengths, then you can tell by combining the two counts if there's a sound present either at or near to that frequency (though you'd need to do it from three staggered staring points to guarantee that one of them will detect a signal if there is one). If (and only if) there is a sound at or near that frequency, repeat the measurement over a larger number of wavelengths and for different frequencies nearby to pin down which is/are still potentially present. Where two nearby frequencies might temporarily be in sync, neither can be ruled out at that moment, but each can be picked up or ruled out further away where they should be out of sync, at which point you should discover whether you're dealing with one frequency or two similar ones (or more). The tighter you want to pin it down, the more processing you have to do, but it would be restricted to those areas where the possibilities are still open - if the first test finds nothing, there is no further processing to do near that frequency at that point. Perhaps 3 or 4 tests per octave (major or minor third intervals) over a range of five or six octaves would be applied at any one point, and those that find a potential resonance would then be retested to pin them down to quarter-tone resolution (or better).

If a more mathematical approach is actually faster, I can always replace part of my code with it later on, but what matters now is simply getting it to work.

evan_au · « **Reply #12 on:** 08/04/2013 10:05:07 »

Quote

I see, so you could somehow have lots of sound and anti-sound where nothing is heard, but the pressure is constantly higher while it's going on.

Sorry, that wasn't really what I was trying to convey - sine waves won't increase the average "DC" sound pressure, only the "AC" sound pressure. I would complete this sentence as "you could have lots of sound and anti-sound where nothing is heard, but there will be an adjacent location where the sound amplitude is increased."

Quote

thinking about the cochlea...

The Fourier analysis can do a similar frequency analysis to the hairs of the cochlea, picking out the main frequencies in a complex sound.

When a sound consists of a fundamental and higher-frequency components, counting the zero-crossings produces a misleadingly high answer, because the higher frequencies produce extra zero-crossings. You can reduce this impact by low-pass filtering the sound, which will help estimate the fundamental frequency - but then you lose the higher frequencies.
Vowel recognition is assisted if you collect the frequencies of all formants - and a bank of bandpass filters can certainly do this. The FFT analysis measures all formants simultaneously, with a surprisingly small number of calculations. (The same argument goes for distinguishing fricatives.)

David Cooper · « **Reply #13 on:** 08/04/2013 19:12:22 »

Quote from: evan_au on 08/04/2013 10:05:07

Sorry, that wasn't really what I was trying to convey - sine waves won't increase the average "DC" sound pressure, only the "AC" sound pressure. I would complete this sentence as "you could have lots of sound and anti-sound where nothing is heard, but there will be an adjacent location where the sound amplitude is increased."

But if it's only increased to twice the volume there, the average volume will be less than twice as high as with one singer, and it still looks as if it should be the same as the volume with one singer. I must be making an error somewhere there, but I still can't see what it is.

Quote

The Fourier analysis can do a similar frequency analysis to the hairs of the cochlea, picking out the main frequencies in a complex sound.
...
The FFT analysis measures all formants simultaneously, with a surprisingly small number of calculations. (The same argument goes for distinguishing fricatives.)

How do you get the raw data into the right form to start with though? I'd have thought the process of turning a stretch of wave into some kind of equation for that wave would involve a large number of steps which would be just as complex as my way of analysing it, and then with the addition of having to do a further bit of maths with it to transform it into a series of individual component waves afterwards.

Quote

When a sound consists of a fundamental and higher-frequency components, counting the zero-crossings produces a misleadingly high answer, because the higher frequencies produce extra zero-crossings. You can reduce this impact by low-pass filtering the sound, which will help estimate the fundamental frequency - but then you lose the higher frequencies.

Yes, the zero-crossing on the ZX Spectrum was missing a lot of detail of the story, but it's still clear from what it was picking up that the task is simpler than a lot of people imagine it is - you can do an initial analysis on a tiny amount of extremely impoverished data and get practically the full story from that (in terms of recognising speech sounds), and I see that as a possible route to making the analysis faster. I am working now with proper .wav files this time though, and that makes it possible to pick up every detail of the story, opening the door to doing all other kinds of sound recognition. All I have to do is compare +ve with -ve movement (without waiting for it to cross the zero line) - if there's enough movement to make a hair tuned to a particular frequency resonate, the sound will be detected by the ear and must also be detectable by my method of simulating that, so I should be able to match ability of the ear. I just have to do lots of experiments to work out how many wavelengths need to be processed to get enough information out, and find ways to deal with what happens when there are complications like the pitch of the base note changing during a word, but there is without question a solution there waiting to be found which doesn't depend on any complex maths, and it may turn out to be quicker.

evan_au · « **Reply #14 on:** 08/04/2013 22:08:20 »

Quote

But if it's only increased to twice the volume there, the average volume will be less than twice as high as with one singer

The Sound Pressure Level is twice as high where there is constructive interference, which means that the power is 4 times higher, because Power is proportional to (SPL)².
Overall, SPL is increased by a factor of 1.4, and the audio power increases by (1.4)², ie doubled.

Quote

compare +ve with -ve movement

This method of analysis is equivalent to high-pass filtering, so it will emphasise the higher formants, rather than detecting the fundamental tone. From what I have read, you get most information from the lowest formant, and successively less from each higher formant.

Quote

How do you get the raw data into the right form to start with though?

The functions of speech recognition are normally broken into different modules, which ideally operate in parallel:

Sound Pressure Level collection from the microphone port (producing samples of a .wav file)
Running the frequency analysis algorithm
Matching the audio samples to potential syllables & words
Matching the potential words to syntactically valid sentences
Running applications under the operating system
If hardware support is available, these can truly run in parallel (like human hearing); if only software support is available, you must simulate some parallelism by using interrupt routines.
Unfortunately, the frequency analysis algorithm is very processor-hungry (FFT and bandpass filtering is best done on a Digital Signal Processor, if you have one)

David Cooper · « **Reply #15 on:** 09/04/2013 00:31:38 »

Quote from: evan_au on 08/04/2013 22:08:20

The Sound Pressure Level is twice as high where there is constructive interference, which means that the power is 4 times higher, because Power is proportional to (SPL)².
Overall, SPL is increased by a factor of 1.4, and the audio power increases by (1.4)², ie doubled.

So does that mean it actually sounds four times as loud at those points where the two waves add together? I suppose it must do if there's to be an increase in volume overall.

I still have a problem though with a large group of singers, or sources of sound where it still looks as if there will always be cancelling out going on to the point that the volume goes down. In the simplest case of this we could have a single speaker with multiple inputs trying to make it do different things, all of which add up to zero movement. The power is applied, but it's all used in opposing other applications of power. In this case no volume of sound is produced at all. If you then introduce more than one speaker, the same kind of thing could apply, but in this case the speakers would be restricting each other's movement to reduce the volume. In the same way with a choir, the sound made by singers could reduce the volume generated by other singers. Is that not possible?

Quote

Quote
compare +ve with -ve movement
This method of analysis is equivalent to high-pass filtering, so it will emphasise the higher formants, rather than detecting the fundamental tone. From what I have read, you get most information from the lowest formant, and successively less from each higher formant.

I'm starting out by looking for sounds of low frequency, and the same method really ought to work - the cochlea can't be doing anything substantially differently, and I very much doubt that the brain does complex maths to solve problems of this kind. It may turn out to be necessary to build up more of a picture of average location of the wave over each half of a cycle, but the numbers I'm getting out so far make it look as if it may be viable without going that far. Also, the ZX Spectrum experiments show that measuring time spent to either side of the zero line will easily identify the base note, so that's another simple approach which could be combined with what I'm already doing. I'll know more once I've got a bit more code written and can start to display the results in spectrographic form (there's probably a proper name for the kind of diagram I have in mind, but if so it excapes me).

Quote

Quote
How do you get the raw data into the right form to start with though?
The functions of speech recognition are normally broken into different modules, which ideally operate in parallel:

My question wasn't about all the rest of the process, but purely about how you would actually feed the numbers from a .wav file into a formula that can do something useful with them. I've now got a slight hint as to how that's done from the http://en.wikipedia.org/wiki/Fast_Fourier_transform page, but I still can't gauge how much processing it involves to get a proper feel for whether it really has an advantage or not.

The rest of the speech recognition process is obvious, though I will add another step involving semantics (which is where this ties in with my linguistics work, but I don't intend to discuss that).

Quote

Unfortunately, the frequency analysis algorithm is very processor-hungry (FFT and bandpass filtering is best done on a Digital Signal Processor, if you have one)

For the moment I'm not keen to write device drivers for a multiplicity of sound cards, so I'm restricted to using just the CPU and FPU for everything, getting all the sound recordings in as .wav files. If I can get to the point where it all works, I'll then look at sound cards and write device drivers for the machines I program on so that I can get sound in from a microphone and use the speakers, and after than I can look at the extra processing that the sound card can do and think about changing to using FFT if it looks as if it's more efficient, but I can get all the main work done in advance of that and find out exactly what can be done by simpler analysis (of the kind which I suspect the brain actually does). I'm planning to release my code for this part of the process if I can get it to work (just the code for turning the .wav data into a spectrograph of component sounds), so I'll let you know where you can download it if I manage to get it working.

David Cooper · « **Reply #16 on:** 09/04/2013 18:47:19 »

I'm not sure the base note is going to be as important as I'd thought - it looks as if the vowels are white noise hovering around the frequencies of a series of harmonics not connected with the base note, but related to the size of your mouth/throat. I was looking at the wave for an ee with a falling base note and it was clear that the high note on top of it was maintaining its frequency throughout rather than falling to keep level with the base note.

If you whisper rather than speak, you can hear their invariable pitch without being distracted by what the vocal cords are doing (though they will doubtless vary between different people). The sequence of harmonics if we start on a convenient note is: D, D', S', D", M", S", Tb", D"', R"', M"', and the vowels seem to be distributed starting from the second of those: oo, oh, aw, ah, a, uugh, eh, ay, ee. If you alternate between whispering aw (the vowel in "saw") and eh (the vowel in "pen") you can hear that the jump between them is an octave.

Edit: there's another white noise "note" that goes down while the other one goes up, at least with the top three vowels, so it may be possible to distinguish them from those two components alone, but I'm not ready to start programming that part of the process yet. I have now got it picking out all the frequencies in whole-tone steps, using 4 wavelengths to pick up white noise and 16 wavelengths for anything close to a sine wave. It looks as if most of the useful work in speech recognition depends on the white noise components, and they involve less processing so it's looking hopeful.

wolfekeeper · « **Reply #17 on:** 27/04/2013 00:26:06 »

The equation for the amplitude of a group of singers all singing with the same volume is:

Av = Vs SQRT(n)

where n is the number of singers, Vs is the volume of the singers, Av is the overall volume

This assumes that the singers voices are not well correlated, and are adding more or less randomly- and that while they may be roughly the same pitch, they're not exactly in phase. For human singing this is highly likely to be a very good approximation.

However, for people listening, you want to know the perceived volume. Humans perceive 'loudness' logarithmically, so as can be seen by taking logs of the above equation n people will seem to be n times louder.

David Cooper · « **Reply #18 on:** 27/04/2013 21:37:21 »

Quote from: wolfekeeper on 27/04/2013 00:26:06

The equation for the amplitude of a group of singers all singing with the same volume is:

Av = Vs SQRT(n)

where n is the number of singers, Vs is the volume of the singers, Av is the overall volume

This assumes that the singers voices are not well correlated, and are adding more or less randomly- and that while they may be roughly the same pitch, they're not exactly in phase. For human singing this is highly likely to be a very good approximation.

Thanks for that formula. Again it suggests the volume will go on up infinitely as the number of singers go up, but that's the part I'm questioning. Let me simplify things a bit more. Suppose all the singers are singing into microphones which are all wired up to a single speaker elsewhere and it's the volume coming from that speaker that we're interested in rather than hearing the singers directly. Let's also assume that the speaker is capable of producing infinite volume if required. What I want to understand is why they don't all cancel out once there are enough of them to ensure that for every force trying to push the speaker one way there will always be an equal force trying to push it the other way. The formula doesn't appear to address that.

wolfekeeper · « **Reply #19 on:** 27/04/2013 22:32:37 »

In the exact situation you describe they add statistically; you can show that the average standard deviation of the sum goes as 1/sqrt(n) of the deviation of the single example. (the average standard deviation is the square root of the sums of the squares divided by n).

But the standard deviation of an audio signal- the root of the average square deviation from the mean... is the amplitude.

So if there's n of them, then the average standard deviation needs to be multiplied by n, 1/sqrt(n) * n - and you get a total root square of amplitude that goes with the square root of n.

I mean, ten people are going to be louder than one.

But to a degree they will cancel out, so the amplitude won't be ten times bigger.

It turns out the amplitude goes as the square root of the number of people.

Is kidney volume associated with kidney function? Started by jinjonBoard Physiology & Medicine	Replies: 2 Views: 1653	29/04/2022 23:49:51 by evan_au
During electrolysis of water what happens to the volume of water displaced? Started by tommya300Board General Science	Replies: 1 Views: 5311	25/08/2010 06:50:55 by Bored chemist
If I drop my TV the remote batteries burst out full volume Pop music station Started by syhprumBoard Geek Speak	Replies: 4 Views: 7137	01/05/2010 18:16:40 by RD
MOVED: Could the cross product in spherical coordinates be a scalar volume atom? Started by Colin2BBoard Physics, Astronomy & Cosmology	Replies: 0 Views: 1959	05/12/2020 17:32:30 by Colin2B
What are the risks of increasing blood volume to boost athletic performance? Started by stanaBoard Physiology & Medicine	Replies: 2 Views: 5479	13/06/2008 23:12:03 by stana

Choir volume: could more singers mean quieter?

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

David Cooper (OP)

Similar topics (5)