Choir volume: could more singers mean quieter?

  • 47 Replies
  • 12529 Views

0 Members and 1 Guest are viewing this topic.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Choir volume: could more singers mean quieter?
« on: 04/04/2013 20:11:30 »
If one person sings, (she) sings at a certain volume. If two people sing the same notes at the same volume, the sound waves could add together or cancel out, so at times it might sound louder and at other times quieter. With a thousand singers though, the sounds should for the most part cancel out.

If the voices of a thousand singers all produced sound waves which added together, they'd be a thousand times louder than a single singer, but that clearly never (or almost never) happens, so there is clearly a lot of cancelling out going on. Of course, there's rarely going to be a point where all the singers cancel out completely either as there will at any moment be a random excess of them pushing the sound waves one way more than the other, so that means a choir should actually continue to get a louder as the number of singers goes up (because the size of the random excess will get larger with it), but how much does the volume actually go up on average in proportion to the number of singers? Is there a formula for this?

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #1 on: 04/04/2013 20:58:33 »
When you add an extra singer, the total sound power increases.
  • It is true that at some instants, in some listening positions, at some frequencies, the sound from the second singer will cancel out the sound from the first singer
  • ...but this will be overridden by the fact that at most frequencies, in most listening locations, most of the time, the sound power will be higher.
  • To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators, not human singers.

The effect you notice is that in many areas of human perception, the response to stimulus is logarithmic, not linear.
  • This is why the Decibel scale of sound perception is calculated as 10*LOG(P1/P2).
  • Adding a second singer increases the sound level by 3dB, which is just perceptible to the human ear under normal conditions
  • Adding 9 singers increases the sound level by 10dB, which is audibly much louder
  • Adding 99 singers increases the sound level by 20dB
  • Adding 999 singers increases the sound level by 30dB
  • There is the same perceived increase in volume in going from 1 to 10 singers as there is going from 100 to 1000.
See: http://en.wikipedia.org/wiki/Sound_pressure#Sound_pressure_level

*

Offline RD

  • Neilep Level Member
  • ******
  • 8169
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #2 on: 05/04/2013 00:45:35 »
To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators ...

and the the sound waves must be in anti-phase ... http://en.wikipedia.org/wiki/Phase_%28waves%29#Phase_difference
« Last Edit: 05/04/2013 00:48:05 by RD »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #3 on: 05/04/2013 21:51:39 »
When you add an extra singer, the total sound power increases.
  • It is true that at some instants, in some listening positions, at some frequencies, the sound from the second singer will cancel out the sound from the first singer
  • ...but this will be overridden by the fact that at most frequencies, in most listening locations, most of the time, the sound power will be higher.

The sound power must also be higher when you generate more sound to cancel sound, and yet the volume somehow goes down. The difficulty I'm having with this is that with a large number of singers you should get substantial amounts of the same kind of cancellation because the allignments will become better and better as the numbers go up.

Quote
  • To get a long-lasting and deep sound cancellation, you need very precise sine-wave generators, not human singers.

It shouldn't matter if the voices are wobbly - if there are enough of them, the precision should improve because every wobble will end up being cancelled out by an opposite wobble (ignoring any random excesses that take things more in one direction than the other).

Quote
The effect you notice is that in many areas of human perception, the response to stimulus is logarithmic, not linear.

I thought scales of this kind were only logarithmic in order to cover a huge range without the values becoming unmanageable and that any logs in formulae are there for conversion purposes to fit in with the distorted scale.

Quote
  • This is why the Decibel scale of sound perception is calculated as 10*LOG(P1/P2).
  • Adding a second singer increases the sound level by 3dB, which is just perceptible to the human ear under normal conditions
  • Adding 9 singers increases the sound level by 10dB, which is audibly much louder
  • Adding 99 singers increases the sound level by 20dB
  • Adding 999 singers increases the sound level by 30dB
  • There is the same perceived increase in volume in going from 1 to 10 singers as there is going from 100 to 1000.

That appears to allow no room for any cancellation to take place at all, and yet with a large choir there should be total cancellation as every wave will be cancelled out by an opposite one. What am I missing?

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #4 on: 05/04/2013 23:10:06 »
Interference from 2 point sources is illustrated in the diagram showing interference fringes at the bottom of this webpage:

Note that:
  • This assumes the "ideal" case of 2 point sources, of identical and constant frequency, intensity, and phase
  • Cancellation is only complete at particular points where the listener hears precisely equal intensity from the 2 sources, and 180 degrees out of phase. At all other points, there is some sound power present.
  • At an equal number of points, the sound power is doubled through constructive interference, where the listener hears precisely equal intensity from the 2 sources, and 0 degrees phase (in-phase).
  • The average effect, over the entire listening space, is that the sound power is doubled.
  • The human voice is not a pure, constant frequency of fixed phase - it consists of a number of higher-frequency resonances (or formants) which occur at different frequencies for different people because of the shape of their throat (even if they are singing the same fundamental frequency). Different frequencies will not cancel each other at all - they add in power.
  • If the two singers do happen to produce the same higher frequency, it will cancel at different positions in the listening space than the fundamental frequency, so those who would hear nothing from a pair of sine-wave generators would hear something from a pair of human singers.
  • Cancellation occurs only if the two sources have equal amplitude. However, the vocal resonances are excited in bursts by the opening and closing of the vocal folds or "adams apple", which means that the amplitude of sound from the two singers is not constant, and so any cancellation will be short-lived. These bursts occur asynchronously in different singers, and so the sound power adds. 
  • Each of these resonances is not a single frequency, but consists of a variety of nearby frequencies, and so these do not cancel out between different singers
  • A human singer cannot keep the phase of their voice constant, so the audience position of the cancellations and additions will change dynamically - the average effect is that the sound power will be doubled
  • The human head does not have a single point receiver - it is a distributed receiver. If one ear hears a cancellation, the other ear almost certainly will not. The overall effect of more singers is more received power. (Modern mobile phones and WiFi devices use the same technique.)
  • When you add multiple point sources (eg 3 or more singers), the probability of getting a total cancellation drops, since there are very few points in which the amplitude and phase cancel for all 3 singers, at all of their formants.
So, sound cancellation does occur temporarily at particular points in the audience, but it is more than overcome by the increase in transmitted (and received) sound power at multiple frequencies and phases at every point in the audience as you add more singers.
« Last Edit: 06/04/2013 00:43:03 by evan_au »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #5 on: 06/04/2013 22:10:29 »
At an equal number of points, the sound power is doubled through constructive interference, where the listener hears precisely equal intensity from the 2 sources, and 0 degrees phase (in-phase).

The average effect, over the entire listening space, is that the sound power is doubled.

How can it be doubled on average when it's only doubled in some places and cancelled out entirely in others (while being somewhere in between at the rest)? On average, it looks as if it should be the same as with one sound source.

Quote
The human voice is not a pure, constant frequency of fixed phase - it consists of a number of higher-frequency resonances (or formants) which occur at different frequencies for different people because of the shape of their throat (even if they are singing the same fundamental frequency). Different frequencies will not cancel each other at all - they add in power.

The larger the number of singers, the less relevant those differences become: most sounds should be cancelled out by other sounds and the volume should drop.

Quote
If the two singers do happen to produce the same higher frequency, it will cancel at different positions in the listening space than the fundamental frequency, so those who would hear nothing from a pair of sine-wave generators would hear something from a pair of human singers.

But again in a large group there would be other sounds which cancel the other frequencies from them as well.

Quote
Cancellation occurs only if the two sources have equal amplitude. However, the vocal resonances are excited in bursts by the opening and closing of the vocal folds or "adams apple", which means that the amplitude of sound from the two singers is not constant, and so any cancellation will be short-lived. These bursts occur asynchronously in different singers, and so the sound power adds.

Again this is something that would be evened out in a large group and lead to a reduction in volume. The same applies to many of your later points, so I won't keep repeating it.

Quote
When you add multiple point sources (eg 3 or more singers), the probability of getting a total cancellation drops, since there are very few points in which the amplitude and phase cancel for all 3 singers, at all of their formants.

I wouldn't expect a total cancellation, but it still looks as if there should be enough cancellation to reduce the volume as the numbers go up, assuming an even distribution of emitted sounds, but then you are never going to get an even distribution - there will be random weightings which fluctuate, and those excesses will by themselves always lead to larger groups being louder as the numbers go up. The point of my original post was that this might be the real mechanism behind it and that it may not have been looked at properly by science. A soloist can still be heard over a choir of a hundred voices all singing at perhaps half power - they are not fifty times louder.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #6 on: 07/04/2013 00:29:45 »
Quote
distorted scale

The decibel scale actually reflects the behaviour of human hearing:
  • The ear is continually bombarded by the random impacts of air molecules, moving at just about the speed of sound
  • And yet (under laboratory conditions) it can pick out the periodic motion of air molecules, moving with an amplitude of the width of a hydrogen atom
  • Painfully loud sounds are about 100dB louder, a range of power of about 1010
  • The human auditory system compresses this huge range into a more manageable scale
  • ...and it's not just for Sound: Psychological studies of human response to wealth (eg gold), food and pain show a similar logarithmic response, spanning many orders of magnitude in stimulus.
So the scale is not distorted - it is actually a reasonable reflection of human perception of many kinds of stimuli.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #7 on: 07/04/2013 08:16:39 »
A friend who sings with a choir joked today that if you have more than 1 singer, all of the singers are singing quietly so they can hear and follow the other singers...

Quote
A soloist can still be heard over a choir of a hundred voices all singing at perhaps half power - they are not fifty times louder.
These days, if you have a nominated soloist, they usually have their own microphone, which gets turned up louder than all the other people in the choir, so you can still hear them clearly.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #8 on: 07/04/2013 20:34:20 »
And yet (under laboratory conditions) it can pick out the periodic motion of air molecules, moving with an amplitude of the width of a hydrogen atom.

That's quite a thought. I suppose the cochlea amplifies it by narrowing, generating an effect like the Severn Bore to make it detectable.

Quote
So the scale is not distorted - it is actually a reasonable reflection of human perception of many kinds of stimuli.

It's clearly distorted, designed as it is to fit in with our use of base ten. It also muddies the waters when you're trying to understand things in terms of doublings of volume.

Anyway, I've been adding waves together in a computer by taking identical waves and offsetting them with an even distribution, the result being that they appear to cancel each other out, much as should happen with most of the sound made by a large number of singers, though I can't work out where the energy goes to as it can't be cancelled.

I've also been looking at the component sounds of vowels in a wave editor (which I have just started writing), and they do appear, as you would expect, to be combinations of a handful of near-perfect sine waves. The vowel "ee" looks particularly simple, probably just being two sine waves with a quiet high-pitched harmonic imposed on a deeper note. I'm writing more software now to check this by breaking the sound down into the individual waves that make up the more complex ones (though I'm doing this work primarily as an attempt to add speech recognition to my own operating system).

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #9 on: 07/04/2013 21:01:54 »
When you measure the amplitude of a sound (eg the vowel "ee") with a microphone feeding into a computer, you are actually measuring the Sound Pressure Level.

The microphone is insensitive to the constant air pressure, and only measures the variation from the average. The average variation/value of a sine wave is zero. So if you average many sine waves, the average will still be zero.

However, the power & energy of the sound is proportional to the (SPL)2. The average of (Sin)2 is not zero, and if you add up the energy of many sine waves, the average is also not zero.

This is why the calculation of sound power in decibels has two formulae

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #10 on: 07/04/2013 21:25:45 »
The usual way of doing speech recognition starts by breaking the speech into short segments (eg 5-10ms) and performing a Fast Fourier Transform to identify the main frequencies in this segment. For vowel sounds, there will be several formants which can be used to distinguish the different vowels; for unvoiced sounds like "sh", "s" and "f", there is a white-noise type spectrum.

But the characteristics of speech change rapidly, especially around plosives like "p" and "b", so the changes in successive segments must be tracked.

These sequences must then be mapped onto candidate words and sentences, taking into account that different people have different pitches to their voice, different regional accents, and different speeds and intonation. 
See http://en.wikipedia.org/wiki/Speech_recognition#Algorithms

These algorithms are not simple, so why not start from some freeware or open-source speech recognition software, eg:
http://savedelete.com/7-best-free-speech-recognition-software.html
« Last Edit: 07/04/2013 22:06:22 by evan_au »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #11 on: 08/04/2013 00:37:40 »
When you measure the amplitude of a sound (eg the vowel "ee") with a microphone feeding into a computer, you are actually measuring the Sound Pressure Level.

The microphone is insensitive to the constant air pressure, and only measures the variation from the average. The average variation/value of a sine wave is zero. So if you average many sine waves, the average will still be zero.

I see, so you could somehow have lots of sound and anti-sound where nothing is heard, but the pressure is constantly higher while it's going on. I'm trying and still failing at the moment to imagine this working with two sine waves where they cancel each other out: one is sending out a "wave" of increased pressure (the "wave" being the part of a wave where air is being pushed away) while the other is sending out a "wave" of reduced pressure, and the two should cancel out without any overall increase in pressure. There must be some serious flaw somewhere in my way of looking at this.


The usual way of doing speech recognition starts by breaking the speech into short segments (eg 5-10ms) and performing a Fast Fourier Transform to identify the main frequencies in this segment. For vowel sounds, there will be several formants which can be used to distinguish the different vowels; for unvoiced sounds like "sh", "s" and "f", there is a white-noise type spectrum.

Yes, I discovered many years ago that you could easily distinguish between all speech sounds by looking at a display of highly degraded input - I did some experiments on a ZX Spectrum+3 (bought 15 years ago for 2 in a jumble sale - first machine I ever programmed on). By speaking into it via the tape interface I was able to record "sound" and play it back through the TV speaker where speech was only just intelligible. When the microphone/speaker's moving part (technical term for this component not known to me) was to one side it was recorded/sent as a 1, and when to the other side it was a 0, so the only indication of how far away from the centre it ever moved was how long it took to switch from 0s to 1s and back. This is like working with .wav files where there is only a middle and top position displayed (and stored). Even with so little signal to work with though, I could read the speech sounds just from looking at the clumps of zeros and ones on the screen, distinguishing patterns in the white noise of fricatives/sibilants as easily as with the vowels.

Quote
But the characteristics of speech change rapidly, especially around plosives like "p" and "b", so the changes in successive segments must be tracked.

With plosives you get momentary bursts of fricatives which are too short to notice when listening but which are still picked up by the ear and which provide the distinctive character of the plosives: "t" gives you a little burst of "s", for example.

Quote
These algorithms are not simple, so why not start from some freeware or open-source speech recognition software, eg:
http://savedelete.com/7-best-free-speech-recognition-software.html

It would be no fun just getting it all from someone else, but I'm also sure it can be done a lot better, and starting out by studying how its normally done isn't likely to lead to finding better ways to attack the problem. Also, however efficient the Fourier Transform may be, surely it requires you to turn the wave into an equation to start with? That looks like a complex task which is likely to cost as much time as it saves.

When thinking about the cochlea a while back, it struck me that the hairs are probably tuned to resonate at particular frequencies (one frequency per hair) to detect whether those frequencies are present, thereby saving the brain from doing a lot of complex maths. There's an obvious way to do the equivalent of that using very simple code which I am going to try out: pick a frequency, count +ve displacement for half the time, then count -ve displacement for the other half and spread this over three wavelengths, then you can tell by combining the two counts if there's a sound present either at or near to that frequency (though you'd need to do it from three staggered staring points to guarantee that one of them will detect a signal if there is one). If (and only if) there is a sound at or near that frequency, repeat the measurement over a larger number of wavelengths and for different frequencies nearby to pin down which is/are still potentially present. Where two nearby frequencies might temporarily be in sync, neither can be ruled out at that moment, but each can be picked up or ruled out further away where they should be out of sync, at which point you should discover whether you're dealing with one frequency or two similar ones (or more). The tighter you want to pin it down, the more processing you have to do, but it would be restricted to those areas where the possibilities are still open - if the first test finds nothing, there is no further processing to do near that frequency at that point. Perhaps 3 or 4 tests per octave (major or minor third intervals) over a range of five or six octaves would be applied at any one point, and those that find a potential resonance would then be retested to pin them down to quarter-tone resolution (or better).

If a more mathematical approach is actually faster, I can always replace part of my code with it later on, but what matters now is simply getting it to work.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #12 on: 08/04/2013 10:05:07 »
Quote
I see, so you could somehow have lots of sound and anti-sound where nothing is heard, but the pressure is constantly higher while it's going on.
Sorry, that wasn't really what I was trying to convey - sine waves won't increase the average "DC" sound pressure, only the "AC" sound pressure. I would complete this sentence as "you could have lots of sound and anti-sound where nothing is heard, but there will be an adjacent location where the sound amplitude is increased."

Quote
thinking about the cochlea...
The Fourier analysis can do a similar frequency analysis to the hairs of the cochlea, picking out the main frequencies in a complex sound.

When a sound consists of a fundamental and higher-frequency components, counting the zero-crossings produces a misleadingly high answer, because the higher frequencies produce extra zero-crossings. You can reduce this impact by low-pass filtering the sound, which will help estimate the fundamental frequency -  but then you lose the higher frequencies.
Vowel recognition is assisted if you collect the frequencies of all formants - and a bank of bandpass filters can certainly do this. The FFT analysis measures all formants simultaneously, with a surprisingly small number of calculations. (The same argument goes for distinguishing fricatives.)

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #13 on: 08/04/2013 19:12:22 »
Sorry, that wasn't really what I was trying to convey - sine waves won't increase the average "DC" sound pressure, only the "AC" sound pressure. I would complete this sentence as "you could have lots of sound and anti-sound where nothing is heard, but there will be an adjacent location where the sound amplitude is increased."

But if it's only increased to twice the volume there, the average volume will be less than twice as high as with one singer, and it still looks as if it should be the same as the volume with one singer. I must be making an error somewhere there, but I still can't see what it is.

Quote
The Fourier analysis can do a similar frequency analysis to the hairs of the cochlea, picking out the main frequencies in a complex sound.
...
The FFT analysis measures all formants simultaneously, with a surprisingly small number of calculations. (The same argument goes for distinguishing fricatives.)

How do you get the raw data into the right form to start with though? I'd have thought the process of turning a stretch of wave into some kind of equation for that wave would involve a large number of steps which would be just as complex as my way of analysing it, and then with the addition of having to do a further bit of maths with it to transform it into a series of individual component waves afterwards.

Quote
When a sound consists of a fundamental and higher-frequency components, counting the zero-crossings produces a misleadingly high answer, because the higher frequencies produce extra zero-crossings. You can reduce this impact by low-pass filtering the sound, which will help estimate the fundamental frequency -  but then you lose the higher frequencies.

Yes, the zero-crossing on the ZX Spectrum was missing a lot of detail of the story, but it's still clear from what it was picking up that the task is simpler than a lot of people imagine it is - you can do an initial analysis on a tiny amount of extremely impoverished data and get practically the full story from that (in terms of recognising speech sounds), and I see that as a possible route to making the analysis faster. I am working now with proper .wav files this time though, and that makes it possible to pick up every detail of the story, opening the door to doing all other kinds of sound recognition. All I have to do is compare +ve with -ve movement (without waiting for it to cross the zero line) - if there's enough movement to make a hair tuned to a particular frequency resonate, the sound will be detected by the ear and must also be detectable by my method of simulating that, so I should be able to match ability of the ear. I just have to do lots of experiments to work out how many wavelengths need to be processed to get enough information out, and find ways to deal with what happens when there are complications like the pitch of the base note changing during a word, but there is without question a solution there waiting to be found which doesn't depend on any complex maths, and it may turn out to be quicker.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #14 on: 08/04/2013 22:08:20 »
Quote
But if it's only increased to twice the volume there, the average volume will be less than twice as high as with one singer
The Sound Pressure Level is twice as high where there is constructive interference, which means that the power is 4 times higher, because Power is proportional to (SPL)2.
Overall, SPL is increased by a factor of 1.4, and the audio power increases by (1.4)2, ie doubled.

Quote
compare +ve with -ve movement
This method of analysis is equivalent to high-pass filtering, so it will emphasise the higher formants, rather than detecting the fundamental tone. From what I have read, you get most information from the lowest formant, and successively less from each higher formant.

Quote
How do you get the raw data into the right form to start with though?
The functions of speech recognition are normally broken into different modules, which ideally operate in parallel:
  • Sound Pressure Level collection from the microphone port (producing samples of a .wav file)
  • Running the frequency analysis algorithm
  • Matching the audio samples to potential syllables & words
  • Matching the potential words to syntactically valid sentences
  • Running applications under the operating system
  • If hardware support is available, these can truly run in parallel (like human hearing); if only software support is available, you must simulate some parallelism by using interrupt routines.
  • Unfortunately, the frequency analysis algorithm is very processor-hungry (FFT and bandpass filtering is best done on a Digital Signal Processor, if you have one)

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #15 on: 09/04/2013 00:31:38 »
The Sound Pressure Level is twice as high where there is constructive interference, which means that the power is 4 times higher, because Power is proportional to (SPL)2.
Overall, SPL is increased by a factor of 1.4, and the audio power increases by (1.4)2, ie doubled.

So does that mean it actually sounds four times as loud at those points where the two waves add together? I suppose it must do if there's to be an increase in volume overall.

I still have a problem though with a large group of singers, or sources of sound where it still looks as if there will always be cancelling out going on to the point that the volume goes down. In the simplest case of this we could have a single speaker with multiple inputs trying to make it do different things, all of which add up to zero movement. The power is applied, but it's all used in opposing other applications of power. In this case no volume of sound is produced at all. If you then introduce more than one speaker, the same kind of thing could apply, but in this case the speakers would be restricting each other's movement to reduce the volume. In the same way with a choir, the sound made by singers could reduce the volume generated by other singers. Is that not possible?

Quote
Quote
compare +ve with -ve movement
This method of analysis is equivalent to high-pass filtering, so it will emphasise the higher formants, rather than detecting the fundamental tone. From what I have read, you get most information from the lowest formant, and successively less from each higher formant.

I'm starting out by looking for sounds of low frequency, and the same method really ought to work - the cochlea can't be doing anything substantially differently, and I very much doubt that the brain does complex maths to solve problems of this kind. It may turn out to be necessary to build up more of a picture of average location of the wave over each half of a cycle, but the numbers I'm getting out so far make it look as if it may be viable without going that far. Also, the ZX Spectrum experiments show that measuring time spent to either side of the zero line will easily identify the base note, so that's another simple approach which could be combined with what I'm already doing. I'll know more once I've got a bit more code written and can start to display the results in spectrographic form (there's probably a proper name for the kind of diagram I have in mind, but if so it excapes me).

Quote
Quote
How do you get the raw data into the right form to start with though?
The functions of speech recognition are normally broken into different modules, which ideally operate in parallel:

My question wasn't about all the rest of the process, but purely about how you would actually feed the numbers from a .wav file into a formula that can do something useful with them. I've now got a slight hint as to how that's done from the http://en.wikipedia.org/wiki/Fast_Fourier_transform page, but I still can't gauge how much processing it involves to get a proper feel for whether it really has an advantage or not.

The rest of the speech recognition process is obvious, though I will add another step involving semantics (which is where this ties in with my linguistics work, but I don't intend to discuss that).

Quote
Unfortunately, the frequency analysis algorithm is very processor-hungry (FFT and bandpass filtering is best done on a Digital Signal Processor, if you have one)

For the moment I'm not keen to write device drivers for a multiplicity of sound cards, so I'm restricted to using just the CPU and FPU for everything, getting all the sound recordings in as .wav files. If I can get to the point where it all works, I'll then look at sound cards and write device drivers for the machines I program on so that I can get sound in from a microphone and use the speakers, and after than I can look at the extra processing that the sound card can do and think about changing to using FFT if it looks as if it's more efficient, but I can get all the main work done in advance of that and find out exactly what can be done by simpler analysis (of the kind which I suspect the brain actually does). I'm planning to release my code for this part of the process if I can get it to work (just the code for turning the .wav data into a spectrograph of component sounds), so I'll let you know where you can download it if I manage to get it working.
« Last Edit: 09/04/2013 00:34:59 by David Cooper »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #16 on: 09/04/2013 18:47:19 »
I'm not sure the base note is going to be as important as I'd thought - it looks as if the vowels are white noise hovering around the frequencies of a series of harmonics not connected with the base note, but related to the size of your mouth/throat. I was looking at the wave for an ee with a falling base note and it was clear that the high note on top of it was maintaining its frequency throughout rather than falling to keep level with the base note.

If you whisper rather than speak, you can hear their invariable pitch without being distracted by what the vocal cords are doing (though they will doubtless vary between different people). The sequence of harmonics if we start on a convenient note is: D, D', S', D", M", S", Tb", D"', R"', M"', and the vowels seem to be distributed starting from the second of those: oo, oh, aw, ah, a, uugh, eh, ay, ee. If you alternate between whispering aw (the vowel in "saw") and eh (the vowel in "pen") you can hear that the jump between them is an octave.

Edit: there's another white noise "note" that goes down while the other one goes up, at least with the top three vowels, so it may be possible to distinguish them from those two components alone, but I'm not ready to start programming that part of the process yet. I have now got it picking out all the frequencies in whole-tone steps, using 4 wavelengths to pick up white noise and 16 wavelengths for anything close to a sine wave. It looks as if most of the useful work in speech recognition depends on the white noise components, and they involve less processing so it's looking hopeful.
« Last Edit: 11/04/2013 19:27:29 by David Cooper »

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #17 on: 27/04/2013 00:26:06 »
The equation for the amplitude of a group of singers all singing with the same volume is:

Av = Vs SQRT(n)

where n is the number of singers, Vs is the volume of the singers, Av is the overall volume

This assumes that the singers voices are not well correlated, and are adding more or less randomly- and that while they may be roughly the same pitch, they're not exactly in phase. For human singing this is highly likely to be a very good approximation.

However, for people listening, you want to know the perceived volume. Humans perceive 'loudness' logarithmically, so as can be seen by taking logs of the above equation n people will seem to be n times louder.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #18 on: 27/04/2013 21:37:21 »
The equation for the amplitude of a group of singers all singing with the same volume is:

Av = Vs SQRT(n)

where n is the number of singers, Vs is the volume of the singers, Av is the overall volume

This assumes that the singers voices are not well correlated, and are adding more or less randomly- and that while they may be roughly the same pitch, they're not exactly in phase. For human singing this is highly likely to be a very good approximation.

Thanks for that formula. Again it suggests the volume will go on up infinitely as the number of singers go up, but that's the part I'm questioning. Let me simplify things a bit more. Suppose all the singers are singing into microphones which are all wired up to a single speaker elsewhere and it's the volume coming from that speaker that we're interested in rather than hearing the singers directly. Let's also assume that the speaker is capable of producing infinite volume if required. What I want to understand is why they don't all cancel out once there are enough of them to ensure that for every force trying to push the speaker one way there will always be an equal force trying to push it the other way. The formula doesn't appear to address that.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #19 on: 27/04/2013 22:32:37 »
In the exact situation you describe they add statistically; you can show that the average standard deviation of the sum goes as 1/sqrt(n) of the deviation of the single example. (the average standard deviation is the square root of the sums of the squares divided by n).

But the standard deviation of an audio signal- the root of the average square deviation from the mean... is the amplitude.

So if there's n of them, then the average standard deviation needs to be multiplied by n, 1/sqrt(n) * n - and you get a total root square of amplitude that goes with the square root of n.

I mean, ten people are going to be louder than one.

But to a degree they will cancel out, so the amplitude won't be ten times bigger.

It turns out the amplitude goes as the square root of the number of people.
« Last Edit: 27/04/2013 22:38:39 by wolfekeeper »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #20 on: 28/04/2013 20:02:50 »
I'll have to put it to test with a proper computer program to settle this, though it'll probably need millions of genuinely random numbers to make it work properly.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #21 on: 28/04/2013 20:42:59 »
By all means do so.

Of course in a more obvious and physical sense, rather than the abstract maths, the amplitude has to go as the square root, if it didn't the energy would disappear, as the sum of square of the amplitude is the intensity, the power, so if you have n singers, the power needs to go up proportional to n.

If the cancellation idea worked if several people sang together, to avoid violating conservation of energy, power would have to be forced back into the singers, you would find it progressively harder to sing, the more people that were singing!

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #22 on: 29/04/2013 22:53:51 »
If the cancellation idea worked if several people sang together, to avoid violating conservation of energy, power would have to be forced back into the singers, you would find it progressively harder to sing, the more people that were singing!

Which is exactly like the speaker being unable to move because it's being pushed both ways equally strongly, but it needn't be total silencing in either case, so I'm wondering if it actually does happen to a substantial degree.

I've managed to simplify the problem further by avoiding working with complex waves and simply working with one point on a wave - any one point will do as the probabilities for that point will be the same as for any other point. This means there's no need to write a program to do anything hugely complex as it's now relatively easy, or at least it would be if I knew the right way to apply the maths.

If we imagine that the wave can wander between -1 and 1 for a single singer (or a musical note), then the probability that a point on the wave will have a particular value will vary between -1 and 1 with it being more likely to be at one of the extremes than in the middle because the wave hangs around at around those values for longest. Let's just say for now though (wrongly) that the probability is going to be equal for any value between -1 and 1.

With two singers, we need to extend the range to between -2 and 2, but the probabilities are now different as they're built out of two random numbers which are added together, so this makes the point more likely to be in the middle of the range than at the extreme ends. With a hundred singers, we have a hundred random numbers to add together (each between -1 and 1), and this further weights things towards the middle and well away from the extremes, so the wave will only very rarely get close to -100 or 100, if ever. With a million, billion or trillion, it gets ever closer to the middle (in relative terms), and I was imagining it ending up becoming so restrained by these probabilities that most of the noise would be cancelled out.

So, what I've now done now is apply some numbers to it as follows. It's easiest to replace the sine wave style of waves at this point with square waves to eliminate the time that the wave spends at in-between values, so for one "singer" we now have the wave spending half the time at -1 and the other half of the time at 1, and either of these values represents a volume of 1. With two "singers" we can now get combined values of -2, 0 or 2: for a quarter of the time it will be -2, for another quarter of the time it will be 2, and for half the time it will be 0. The volume will now be the average of two lots of 2 and two lots of 0, so that's 1 again, meaning no change in volume.

The same can be done with three "singers": there will be one -3, three lots of -1, three lots of 1, and one 3, so that's 12/8 and an average volume of 1.5.

And for four "singers": there will be one -4, four lots of -2, six lots of 0, four lots of 2 and one 4, so that's 24/16 for an average volume of 1.5.

With five "singers" there will be one -5, 5 lots of -3, 7 lots of -1, 7 lots of 1, 5 lots of 3 and one 5, so that's 54/26 for an average volume of just over 2.

Now, these numbers look strange (perhaps because there's some problem with using square waves in the first place, although something very close to square waves is possible), but maybe I'm adding the waves together wrongly. Adding a -1 to 0 is certainly correct, but adding a -1 to a -1 maybe shouldn't add up to -2 because the speaker could resist the travel more strongly when the second -1 force is applied to it on top of the original -1, although when you dangle weights under a spring, the spring will move the just about same distance when a second weight is added as it did for the first weight. Anyway, either way, I don't know how to take this any further at the moment, but I'll continue to think about it when time allows.
« Last Edit: 29/04/2013 22:56:54 by David Cooper »

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #23 on: 29/04/2013 23:43:53 »
No, no. Square wave is perfectly valid.

Actually square wave is the drunkards walk, after n steps, if there are n singers.

http://en.wikipedia.org/wiki/Random_walk#One-dimensional_random_walk

(Actually, I don't like the wiki page, it seems badly written).

Randomly phased/frequency square waves approach a normal distribution with enough waves added together:

http://en.wikipedia.org/wiki/Central_limit_theorem
« Last Edit: 29/04/2013 23:49:51 by wolfekeeper »

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #24 on: 30/04/2013 16:04:20 »
Quote
It turns out the amplitude goes as the square root of the number of people.

I agree -  and the sound power grows as the square of the sound pressure amplitude.
So sound power grows proportional to the number of singers, conserving energy.
But human hearing is logarithmic, so it doen't sound like loudness is proportional to the number of singers -it's just a perceptual thing.
 
There are two other perceptual mechanisms by which additional singers sound quieter:
  • There is a muscle in your inner ear that can quickly disconnect the bones which transmit the sounds, thus making a sudden loud sound quieter. But this muscle tires easily, and the effect only lasts perhaps 30 seconds.
  • A recent NS interview mentioned a neurological mechanism where continued loud sound makes the hairs in your cochlea less sensitive, over a period of about 20 minutes. This deafness lasts about 12 hours.
The microphone used for David's speech recognition system does not have these perceptual filters, but reports the sound pressure amplitude as a voltage - at least up to an amplitude where the microphone starts distorting, or the voltage exceeds the maximum range of the microphone input on the computer.
« Last Edit: 30/04/2013 21:54:13 by evan_au »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #25 on: 30/04/2013 23:11:37 »
No, no. Square wave is perfectly valid.

...

Randomly phased/frequency square waves approach a normal distribution with enough waves added together

Okay, so if you remove my simplifications you'd get straight to a graph showing the square root values 1, 1.41, 2, etc., but the square waves will eventually catch up with the graph when it gets to higher values. That means that when we're dealing with a millions singers, we've only got 1/1000 the volume that we'd get if all their sounds were completely in phase with each other such that they added up without any cancellation. They aren't in phase though, so it looks to me as if 99.9% of the sound they're producing must be being cancelled out.

...and the sound power grows as the square of the sound pressure amplitude.
So sound power grows proportional to the number of singers, conserving energy.

Does that sound power include all the cancelled sound though? If not, there's a problem with it when you apply it to a whole lot of sine waves which are in phase with each other such that they add without cancellation, because then if you go from one sine wave with an amplitude of 1 and sound power of 1 to having a thousand sine waves with an amplitude of 1000, the sound power will be a million - that's 1000 times the amount of power put in. This is the key point that is blocking my way to understanding how the volume and power are related.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #26 on: 30/04/2013 23:24:17 »
Okay, so if you remove my simplifications you'd get straight to a graph showing the square root values 1, 1.41, 2, etc., but the square waves will eventually catch up with the graph when it gets to higher values. That means that when we're dealing with a millions singers, we've only got 1/1000 the volume that we'd get if all their sounds were completely in phase with each other such that they added up without any cancellation. They aren't in phase though, so it looks to me as if 99.9% of the sound they're producing must be being cancelled out.
No, volume is best understood as power, rather than amplitude. The power is a million times more with a million singers.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #27 on: 01/05/2013 18:34:23 »
Okay, so if you remove my simplifications you'd get straight to a graph showing the square root values 1, 1.41, 2, etc., but the square waves will eventually catch up with the graph when it gets to higher values. That means that when we're dealing with a millions singers, we've only got 1/1000 the volume that we'd get if all their sounds were completely in phase with each other such that they added up without any cancellation. They aren't in phase though, so it looks to me as if 99.9% of the sound they're producing must be being cancelled out.

No, volume is best understood as power, rather than amplitude. The power is a million times more with a million singers.

It looks to me as if when the power put in is a million, the power cancelled out is nearly a million, and the power remaining in the sound is going to be a thousand. If you do the experiment with a million sine waves all perfectly aligned though, the amplitude will be a million rather than a thousand and the power in the sound will have to be a million too with no part of the sound cancelled out. If you align them with half opposing the other half, you'll put in a million units of power and get zero power in the sound, all of it being cancelled out instead (and maybe generating heat).

If you were to try to capture energy from sound, this would show how much power is really in the sound. There would be a certain amount of wobble in the wave which would carry extra power beyond the main amplitude, but I'd be surprised if that could make up the missing energy in a case where a million voices are only producing an amplitude of a thousand units.

If I'm right about this, then most sound really is cancelled out whenever there are large numbers of sound makers, but my initial mistake was to think that because the percentage cancelled out was heading for 100% it would eventually lead to silence: what would actually be happening is that the cancelled sound gets closer and closer to 100%, but it always falls short by an amount that grows in absolute terms and never declines. It's a case where we have cancellation percentages going up to 99.90% for a million singers, then 99.99990% for a trillion, but these numbers always end with something that isn't a 9 if you follow them far enough and they should never be rounded up to 100% if you're interested in the volume produced.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #28 on: 01/05/2013 19:25:58 »
Well, the amplitude predominately cancels out, according to the square root law, but amplitude is not power.

You have to square amplitude to get power.

I believe that's your central mistake.

It's a bit like electrical voltage. You would think that power goes proportional to the voltage, but if you think about it, power nearly always goes as the square on voltage, because  when voltage goes up, so does the current; and it's current times voltage that is power (both DC and AC).

Similarly when you double the pressure/amplitude of the sound, the flow of the air in each vibration goes up too.
« Last Edit: 01/05/2013 19:28:44 by wolfekeeper »

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #29 on: 02/05/2013 18:10:26 »
You have to square amplitude to get power.

I believe that's your central mistake.

I'd agree with that except for the problem that when you apply ten sine waves to a speaker with identical alignment, each having an amplitude of 1, the combined amplitude will be 10 rather than the square root of 10, so if you then square the amplitude 10 you get a power of 100 which is ten times as much power as you put in.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #30 on: 03/05/2013 21:43:33 »
This is a bit more subtle.

Basically if you're adding them electronically, there's no problem with that; you can get out more power than you put in, because you have an amplifier!

If, as a separate example, you're not adding them electronically, but you're adding them in the normal way you have them all sending sound at you, and being picked up by a microphone; you're allowed by physics to have more power available at a point in space. Conservation of energy only applies to the overall, total energy, and you can get nodes and antinodes that have much more or less than normal, by putting the microphone at that point, you've definitely picked a node.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #31 on: 04/05/2013 20:46:07 »
This is a bit more subtle.

Basically if you're adding them electronically, there's no problem with that; you can get out more power than you put in, because you have an amplifier!

You can't hide the problem in the amplifier. If we have a speaker with ten independent sets of magnets controling its movement, each set of magnets linked to a different input, we can have one input working on its own to produce an amplitude of 1, then add in a second (in exact alignment with the first) to produce an amplitude of 2, etc., all the way up to ten inputs generating a combined amplitude of 10. If you square the 10 to get the power, the official sound power is now ten times the amount of power that was put in.

Do the experiment again with the outputs out of alignment with each other and you'll get a combined amplitude of root 10, and when you square that you'll get the sound power in the sense that it's the amount of power put in. It looks to me though as if this isn't the real sound power because you could only tap a maximum of root 10 units of energy from the sound if you had a mechanism to capture all the power in a sound wave.
« Last Edit: 04/05/2013 20:47:48 by David Cooper »

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #32 on: 04/05/2013 21:03:11 »
Oh well, then you've disproved conservation of energy.

;)

In reality the situations depend subtly on the differences between what you do.

If you just have microphones, each with electronic amplification, then there's basically no connection between the microphones, but if you have simple solenoid-type-magnets then the microphones can also act as loudspeakers, so that when one microphone gets pushed down by the air pressure, another will try to pop up, so it's not at all the same situation.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #33 on: 05/05/2013 18:32:02 »
Let me simplify things further for you. Imagine a gong with ten people hitting it with their bongers (correct technical term for gong-hitting implements not known). If they all hit the gong with their bongers at exactly the same instant, the amplitude of the sound produced will be ten times as loud as if only one hits it. If they all hit it at different points in time though, with half of them hitting it when it's coming back towards them, they will cancel out a lot of the movement of the gong and it will be a lot quieter. It might be better to think of half of them hitting the gong from the other side, so if two bongers hit the gong at the same time from opposite sides, they will cancel each other out and create heat instead. If they're all deaf and blind, they will hit the gong at random times and produce an amplitude of root 10 with the sound energy supposedly being 10, but if they are able to coordinate the movement of their bongers perfectly they can generate a sustained amplitude of 10 with the sound energy supposedly being 100. It doesn't add up.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #34 on: 09/05/2013 15:33:54 »
Let me simplify things further for you. Imagine a gong with ten people hitting it with their bongers (correct technical term for gong-hitting implements not known). If they all hit the gong with their bongers at exactly the same instant, the amplitude of the sound produced will be ten times as loud as if only one hits it.
There will be ten times more energy in the gong.
Quote
If they all hit it at different points in time though, with half of them hitting it when it's coming back towards them, they will cancel out a lot of the movement of the gong and it will be a lot quieter.
No, that's not right. If the hit it at different times, they will add ten times more energy to the gong at different times. (To a pretty good approximation, it does depend a bit on precisely what way it's struck).
Quote
It might be better to think of half of them hitting the gong from the other side, so if two bongers hit the gong at the same time from opposite sides, they will cancel each other out and create heat instead.
Only if they hit it at EXACTLY the same time, then they will effectively not have hit the gong; the hammer will bounce back at exactly the same speed it was struck at, and no energy will be added to the gong, but in virtually any normal case, this perfect strike will not happen.
Quote
If they're all deaf and blind, they will hit the gong at random times and produce an amplitude of root 10 with the sound energy supposedly being 10, but if they are able to coordinate the movement of their bongers perfectly they can generate a sustained amplitude of 10 with the sound energy supposedly being 100. It doesn't add up.
The thing you can always hang your hat on is conservation of energy.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #35 on: 09/05/2013 19:46:58 »
Let me simplify things further for you. Imagine a gong with ten people hitting it with their bongers (correct technical term for gong-hitting implements not known). If they all hit the gong with their bongers at exactly the same instant, the amplitude of the sound produced will be ten times as loud as if only one hits it.
There will be ten times more energy in the gong.

I'm not sure it actually works though as it's harder to push something that's already moving away from you, and ten bongers would accelerate the gong faster such that it may be hard for each bonger to transfer as much energy to it. It may be better to go back to using an example with ten sets of electromagnets.

Quote
Quote
If they all hit it at different points in time though, with half of them hitting it when it's coming back towards them, they will cancel out a lot of the movement of the gong and it will be a lot quieter.
No, that's not right. If the hit it at different times, they will add ten times more energy to the gong at different times. (To a pretty good approximation, it does depend a bit on precisely what way it's struck).

No, it's like trying to push a child on a swing when they're moving towards you - you end up absorbing energy from them instead and they swing less far afterwards. You still have to work hard to do this, and the energy must become heat.

Quote
Quote
It might be better to think of half of them hitting the gong from the other side, so if two bongers hit the gong at the same time from opposite sides, they will cancel each other out and create heat instead.
Only if they hit it at EXACTLY the same time, then they will effectively not have hit the gong; the hammer will bounce back at exactly the same speed it was struck at, and no energy will be added to the gong, but in virtually any normal case, this perfect strike will not happen.

It can happen and it's vital for science to account for that case. If you have a speaker in which the moving part contains a fixed magnet which is then made to move using electromagnets, it is possible to have ten sets of electromagnets which attempt to move the fixed magnet. If they all apply a force of 1 unit each in the same direction at the same time (which is very easy to arrange), the fixed magnet will move with an amplitude of 10 units (while a single set of electromagnets applying a force of 1 would lead to an amplitude of 1). If you make five of the electromagnets apply their force in the opposite direction, the speaker magnet won't move at all. If you let the electromagnets apply force at random times, the amplitude will be root 10.

You can argue that 10 units of energy is put in in one case, no energy at all is put in in another case, and root 10 units of energy in the third, but in each case the energy will have been put into the electromagnets regardless of the result. The key point though is that if you do this, you can't then argue that the energy in the sound is the square of the amplitude, because the first case would need the sound energy to be 100 units. That is the point that needs to be addressed.

*

Offline wolfekeeper

  • Neilep Level Member
  • ******
  • 1092
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #36 on: 09/05/2013 20:46:25 »
The thing you're continually failing to understand is that the details, really, really matter.

The way the clappers hit the gong, the shape of the clappers, the timing of the clappers, whether the particular point on the gong that it hits is moving or not.

In the normal case, where you hit the gong with ten clappers, the energy added to the gong is ten times the amount of energy- and most of it will NOT end up at the centre; the wave energy will spread out in different directions in an inverse law from each clapper, and only a small fraction ends up in the centre at all.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #37 on: 10/05/2013 17:32:35 »
The thing you're continually failing to understand is that the details, really, really matter.

Of course the details matter, but you're chasing all manner of kleinigkeiten instead of tackling the point that really matters.

Quote
The way the clappers hit the gong, the shape of the clappers, the timing of the clappers, whether the particular point on the gong that it hits is moving or not.

Use your imagination and picture the difference between ten bongers hitting the gong at the same instant in the same direction and ten bongers hitting it with the same force but at random times such that some of them reduce movement of the gong rather than adding to it. There is a clear difference between these two cases and it will result in a different amplitude for the sound produced by the movement of the gong (though it's also important not to be confused by the sound of the bongers hitting the gong as these are additional sounds and not the main event - the bongers could actually be padded such that there is no impact sound while still transferring energy to make the gong bong).

Quote
In the normal case, where you hit the gong with ten clappers, the energy added to the gong is ten times the amount of energy- and most of it will NOT end up at the centre; the wave energy will spread out in different directions in an inverse law from each clapper, and only a small fraction ends up in the centre at all.

If you've got a real big gong, all ten bongers can hit it practically at the centre. In fact, we can even eliminate this trivial problem altogether by having them hit the exact same central point, not at the same time but at intervals matching the frequency of the gong bong such that the resulting waves match up exactly and add together without any cancellation. If they hit at random times instead, the amplitude after ten bongers have hit will be considerably less.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #38 on: 17/08/2014 19:39:24 »
Update on the sound analysis program I was writing:-

The original way I tried to do it produced semi-useful results, but I switched to working with area instead and that worked much better, enabling me to isolate pure frequencies without any extra work to eliminate false results. The method can indeed be used as an alternative to FFT, though I don't know how well it compares in speed terms.

The method works by storing the area enclosed by the wave in alternating accumulators, with a stretch of the wave being divided up into equal-length chunks and the area of all the odd numbered chunks being added to a variable summing up the odd chunks while the even numbered chunks' areas are added to a different variable summing up the even chunks. It's a little more complicated than that because a high-frequency wave on top of a low frequency wave will often be superimposed on a steep slope which can generate false results, so you actually have to take the start and end altitudes of the chunk, average them, multiply by the length of the chunk and then subtract that from the area in order to isolate any area caused by a deviation away from a straight line. There is still some noise left over, but it is trivial.

The process would be extremely slow if it was done carelessly as there would be an enormous amount of adding up of areas for different frequencies, but it is only necessary to add results to one of the summing variables when a chunk ends and its variables need to be switched round. Each sample is simply added to a single variable used for all frequencies which keeps track of the area enclosed by the wave and its value alternates between positive and negative, and I call that variable "totality". Each frequency has its own set of variables, one of which stores the value of totality the previous time the chunk for that frequency ran out, thereby making it easy to work out the change in area since then. The previous altitude also has to be stored so that the area to be subtracted can be worked out.

Each frequency has two sets of variables rather than one so that they can analyse the wave from offset starting points, then their results are combined to work out the true amplitude (combined does not mean added - it's a little more complicated than that). I'm currently working with 8 octaves at quarter-tone resolution, but will probably switch to semitones or even whole tones to speed things up (because frequencies in between can still be worked out from that). Each set of variables for a frequency are pointed to by shorter entries in a buffer which are kept in expiry order - the ones that time out first get their clock reset and are put further back in the queue for next time. Sorting them into order is the slowest part of the process, but reducing the number of frequencies tested for will speed things up a lot. Another thing that will speed it up is avoiding testing all the time for the higher frequencies - they only need to be checked for occasionally to see if they're still active, so their clocks can be set to put them a long way down the queue, to the point that there's very little processing needing to be done at all. I haven't done much work on optimising it yet though because I'm still working on phoneme recognition with both spoken and whispered sounds. Another thing I plan to do is test how good the results are by trying to compress sound in the same kind of way as MP3 does, playing back the results to see how good/poor the sound quality is, but that may take some time. I'm concentrating on speech recognition first.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #39 on: 18/08/2014 13:28:05 »
It seems that the new algorithm multiplies the input signal by a +/-1 square wave.

This will pick out different frequencies in the input signal, but:
  • The square wave consists of many different frequencies (f, 3f, 5f, etc), and so it will respond to many different frequencies. It may be better to multiply by a sine wave.
  • The square wave will not detect sine waves which are at 90 degrees phase to the square wave. It may be better to multiply by two square waves, at 90 degrees phase difference.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #40 on: 18/08/2014 17:39:02 »
I'm not doing anything with a square wave. The only multiplying being done is to calculate areas to subtract to eliminate errors, as explained below. I can't work out how the diagram creating thing on this forum works, so I'll have to do this with text graphics and hope it looks the same on your machine.

           ____
       -            -
     /                \
--/--------------\-------------- /----
 /                       \                 /
                            -  ____   -
  a    b    c    d    a'   b'   c'   d'  a

If you imagine that as a sine wave centred on the X-axis, adding up the area from a to a' by adding all the samples in that region together will contrast greatly with the negative area recorded from a' to the next a. By subtracting the latter from the former, you end up with a substantial result. If you start at c instead and add up the area to c', you get 0, and from c' to the next c will also record 0, so alignment is important if you are to be sure of detecting a signal. However, if you use both alignments rather than just one, you can combine the results to get the whole truth out of it.

The actual alignment you use still depends on luck, but if you happen to work from b to b' and then b' to the next b while also working from d to d' and d' to the next d, you will get two scores which can be adjusted according to their relative size to get a fairly accurate representation of the actual areas enclosed by the sine wave (done quickest by looking up a table to see how much adjustment to use). So, the alignment doesn't matter - you can do everything with just two starting positions.

Now imagine that the sine wave above is actually wandering about upon a much longer wavelength wave, so the horizontal line through the middle of it is centered not on Y=0, but Y=100. The same process can be used as before, and when we subtract the area from a' to the next a from the area from the first a to a', we get exactly the same value for the area enclosed between the sine wave and the straight line drawn through the middle of it. What I do though is subtract the area between that line and the X-axis first, even though it makes no difference to the end result in this case. The reason for subtracting this area is to cover cases where the straight(ish) line through the middle of the sine wave is running on a slant, which it will be most of the time because our small sine wave is weaving about on top of another sine wave with a much greater wavelength. That line is not entirely straight, but the errors are small. When that line is tilted, the area underneath it needs to be subtracted because it can distort the results badly otherwise. If the alternating totals for the descent start with a to a' followed by a' to the next a, followed by the next a to a', etc., then all the a to a' sections are bigger than the a' to a sections that follow, so a large false result will be building up as you go along, and this error may not be cancelled out on the way back up again: if the alignment on the way back up starts with a' to a instead of a to a', the error is doubled instead of cancelled, so a high frequency can be detected where there is no sound at that frequency at all. By subtracting the area between the centreline of the wave we're trying to detect and the X-axis, we are left with the area enclosed by that wave itself, plus a small error due to the centreline of the wave not quite being straight, but that's a small error which will not be sufficiently large to affect the result - it will just be low-level noise.

The result of using this method is that I'm getting very clean-looking data out when working with computer generated waves, actual recordings of notes from musical instruments, and from speech sounds where the base note and a variety of harmonics stand out as bright bands against a black background.
« Last Edit: 18/08/2014 19:36:21 by David Cooper »

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #41 on: 18/08/2014 22:09:25 »
OK, with this second description, it seems that the new algorithm consists of:
  • "Adding up the area", which is equivalent to the mathematical operation of integration. In the audio domain, it is also equivalent to a low-pass filter, which will attenuate higher frequencies, and be better at extracting the fundamental frequency.
  • Adjusting the boundaries near the zero crossings, which is a form of interpolation
  • Detecting zero crossings, which is a way to estimate frequency.
  • Separately accumulating the positive and negative areas, and then comparing them at the end. Mathematically, this is just the same as adding them all up (comparison is a form of subtraction)
The problem with using zero crossings is that it only considers the input values when the waveform is near the zero crossings - this is why interpolation is so important. The Fourier transform (or multiplying by a square/sine wave) uses the entire waveform, and so is more sensitive; interpolation is not needed for these algorithms.

[The earlier description sounded like the input was partitioned into fixed-length sections which were accumulated separately - this is equivalent to multiplying by a square wave. The second description sounds like the positive and negative segments of the input waveform are accumulated separately.]

*

Offline alancalverd

  • Global Moderator
  • Neilep Level Member
  • *****
  • 4812
  • life is too short to drink instant coffee
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #42 on: 19/08/2014 07:54:59 »
The square root summation of noise has significant sociological implications.

If you have a crowd of N people with random motivations, you only need to consistently coordinate the actions and voting of √N in order to achieve a particular objective over a long period.

This is the mathematical basis of radical politics and successful religion. Beware the patient few!
 
helping to stem the tide of ignorance

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #43 on: 19/08/2014 19:59:21 »
"Adding up the area", which is equivalent to the mathematical operation of integration. In the audio domain, it is also equivalent to a low-pass filter, which will attenuate higher frequencies, and be better at extracting the fundamental frequency.

It works equally well at high and low frequencies, though more noise may be appearing at very high frequency when lots of sounds at frequencies are present, though I can't yet tell if it's all noise - a lot of it probably is real signal, and it may even be that most of it is. What is certain though is that if I produce artificial waves with two frequencies present, one high and one low, both show up clearly. When looking at more natural sounds where there is more complexity in them, there is a range of harmonics showing up with musical instruments and speech, and there's a fair bit of white noise with speech. What I'm seeing visually with speech is a strong base note with a strong harmonic an octave up, and with the sound "oo" that's practically all there is to the sound, apart from a little white noise at very high pitch caused by air being forced through a restricted opening. With "oh" there's a second harmonic coming in a fifth up (three and a half tones higher). With "aw" another strong harmonic comes in two octaves up from the base note and more come in with "ah" while the lowest harmonic weakens. Similar things happen at the low frequency end with "ee", "ay", "e" (as in "bed") and "a" (as in "cat"), but there are higher frequency components of white noise with them which distinguish them from the other set. The "ugh" vowel in "bird" seems to dampen all of the harmonics. I'm seeing enough detail to tell them apart by eye, so it should be possible to write code that can do the same. (I could tell them apart by eye with the original method too, even with false signals all over them, but the processing would have been more complicated and the extra code needed to eliminate the false signals would always have been slow - I never got round to trying it out but just put it on the shelf to get on with other work instead while waiting for the right idea to do it properly.)

Quote
Adjusting the boundaries near the zero crossings, which is a form of interpolation

Detecting zero crossings, which is a way to estimate frequency.

What I'm doing is detecting crossings not just at zero, but at any altitude where the wave repeatedly switches direction (up/down), and also detecting crossings of a steeply sloping line.

Quote
Separately accumulating the positive and negative areas, and then comparing them at the end. Mathematically, this is just the same as adding them all up (comparison is a form of subtraction)

It is indeed just adding them up - I'm adding up all the area between a sine wave of a specific frequency as it oscillates on top of a wandering line regardless of where that wandering line goes and what angle it is tilted at. Some of that recorded oscillation does not belong to the frequency I'm measuring, but it generally cancels out over a long enough stretch, causing the detection of false signals at some other frequencies, but at low enough levels for the real signals to outgun them. (I'm doing the same thing for 192 different frequencies adding up different alternating areas for each, but all done in one pass to cut the amount of processing required to a tiny fraction of what would be required if they were all counted up separately.)

Quote
The problem with using zero crossings is that it only considers the input values when the waveform is near the zero crossings - this is why interpolation is so important.

I'm collecting the same quality of data on crossings at other altitudes and on slopes by adjusting to make the wave I'm testing for act as if it is oscillating across the X-axis at all times, and I'm doing this for all frequencies.

Quote
The Fourier transform (or multiplying by a square/sine wave) uses the entire waveform, and so is more sensitive; interpolation is not needed for these algorithms.

If I could work out how FFT is done, I could try programming it, but I get lost as soon as "i" comes into an equation. I'm looking for a simpler way of doing things because of that, and I suspect that what I'm doing is closer to what the ear and brain does (and that it has to deal with the same noise issues). I don't have the brain's advantage of parallel processing though, so I'm looking for ways to speed up the process such as only adding up the area once for all the different frequencies at once instead of doing it individually for all 192 of them - the areas only get added to the variables for individual frequencies when they switch from looking for positive areas to looking for negative areas.

It is currently taking 40 times as long to analyse a sound file as the sound file takes to play (that's working on a slow Atom processor) [correction: 20 times as long, but 40x as long as an MP3 encoder running on the same machine which is doing the same job and more], but there are a number of things I can do to speed that up. I'm currently using 4 staggered offsets per frequency instead of 2, and there's no advantage in using 4 over 2. Changing to using only 2 will halve the processing time, and more because it'll speed up the queue sorts even more. I could also drop the quarter tones for another doubling in speed and I should still be able to detects sounds at those frequencies using the ones to either side. I might not even need the semitones. Alternatively, I could retain them all but switch them in and out dynamically when they're required for testing exact pitch, but I don't want to rush into writing more complex code that may not work. I could drop the whole of the top octave as it doesn't show very much, and it uses more twice as much processor time as the octave below it, 4 times as much as the one below that, and so on all the way down. I could switch over to occasional sampling of higher frequencies instead of testing for them continuously; I'm planning to do that first as it might speed things up by as much as 100 times (while all those frequencies could still be monitored continually for activity without the same degree of precision) [though this would depend on not monitoring stretches where a sound can be assumed to continue for a time without changing]. I could also drop one of the channels and just do mono, but I programmed it to be stereo from the outset as I want to be able to detect horizontal position at some stage to help separate out different voices [though dropping stereo would hardly speed it up at all]. Anyway, by doing most of those things, it should be able to process the data in real time as it comes in from a microphone. I'd like to try FFT as well, but I'll see how far I can get with this approach first. I want to test the quality of the processed signal by creating artificial waves at the right strengths for all the frequencies detecting activity and then to use them to build a new sound wave that should sound something like the original - that should give me a direct demonstration of how much noise has been added, but all that really matters is that the results are clear enough to detect speech sounds correctly, and the visual evidence shows me that sufficient detail is there. It's another matter entirely though working out how to probe the data with code to interpret it - lots of the experiments I did last time failed to detect subtle differences that I could see by eye. It should be easier this time as the signal stands out much more clearly, but I still expect it to take a lot of work, and success is not guaranteed.
« Last Edit: 20/08/2014 16:58:45 by David Cooper »

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #44 on: 19/08/2014 22:43:00 »
Quote
altitude [amplitude] where the wave repeatedly switches direction (up/down)
  • Detecting changes in direction is equivalent to the mathematical function of differentiation, or in the audio domain it is a high-pass filter.
  • But integration and differentiation are opposites of each other (an inverse function), so if you are differentiating and then integrating you will be doing a lot of processing work to end up back where you started.

In any computer program (especially doing signal processing, like this one), there are some central loops which may make up only 1% of the code, but take up a majority of the execution time. Focus in on these, and small changes can often make a large impact on processing time.
 
Have a look at a public domain MP3 encoder. This will include code for a FFT.

The MP3 encoder detects the main frequencies present, then uses a model of the human auditory system to throw away sounds which are not consciously audible to humans. This may "clean up" the signal so there is less to process - hopefully without throwing away sounds which are subconsciously audible!

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #45 on: 20/08/2014 17:20:48 »
Quote
altitude [amplitude] where the wave repeatedly switches direction (up/down)
  • Detecting changes in direction is equivalent to the mathematical function of differentiation, or in the audio domain it is a high-pass filter.
  • But integration and differentiation are opposites of each other (an inverse function), so if you are differentiating and then integrating you will be doing a lot of processing work to end up back where you started.

I'm definitely not doing and undoing anything - all I'm doing is calculating area and collecting totals with different boundaries. (I haven't been placing the boundaries with exact precision though, so the errors are highest at the highest frequencies, and that's where noise is being generated that is hiding any real signal. I need to reprogram that part of things, though the sounds in that area are higher than those that can pass through a telephone, so I could just drop them.)

Quote
In any computer program (especially doing signal processing, like this one), there are some central loops which may make up only 1% of the code, but take up a majority of the execution time. Focus in on these, and small changes can often make a large impact on processing time.

Indeed, and I know that it's my sort routine that's slowing things down the most. It was written to be lightning fast for one specific application but is very slow for most other purposes. I'm only using it for this because it's reliable, but I'll write a new one to go with this program later. By reducing the number of entries being moved down the queue at a time though, the sort speed becomes less important, to the point that a slow sort routine could be just as fast as a fast one due to the tiny amount of work being done, so I'm not going to rush into fixing that until I know if it's worth the effort.
 
Quote
Have a look at a public domain MP3 encoder. This will include code for a FFT.

I avoid looking at other people's code because it can interfere with your right to write and distribute your own, so I'll seek help with understanding FFT on a maths forum at some point instead. I'll continue with what I'm already doing first though, because I think it could be just as fast if it's done the right way, and it might even be faster.

*

Offline evan_au

  • Neilep Level Member
  • ******
  • 4246
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #46 on: 20/08/2014 22:33:38 »
Quote
I'm definitely not doing and undoing anything
Algorithms for integrating and differentiation will look very different from each other, but that doesn't stop them being inverses of each other.

Just like the methods we are taught for doing multiplication and division "by hand" look very different from each other, but they are still inverse functions.

Quote
at the highest frequencies, and that's where noise is being generated that is hiding any real signal

Looking for changes in signal direction (differentiation) emphasises high frequencies, and emphasises noise.

You could try filtering the input signal to keep it within the human speech band - there is little useful information above 7kHz.

*

Offline David Cooper

  • Neilep Level Member
  • ******
  • 1505
    • View Profile
Re: Choir volume: could more singers mean quieter?
« Reply #47 on: 21/08/2014 17:55:54 »
I've rewritten the code to eliminate most of the errors that were accumulating at the highest frequency end, and now it's a lot cleaner. I can finally see dramatic differences between S (high pitched white noise in a narrow range), SH (extensive white noise covering several octaves), HL (the Welsh sound LL - there's a bit less white noise at the highest end than with S and SH), HR (similar to HL, but more white noise lower down) and KH (similar to HL, but with two patches of white noise and a gap between them, as well as less at the highest end), while F and TH are distinct from all of those but not greatly different from each other - there's just a hint of more white noise at lower pitch for F (both of these have white noise in the same place as S, but the zone stretches down twice as deep, though nothing like as deep as SH).

So, the quality's certainly good enough to work with now. The next thing to do is optimise the code to make it more practical to work with, and then I'll start writing routines to identify phonemes. After that, I'll have to build a phonetic dictionary, bat aj qlredj hxv a wj qv tajpik wurdz in fqneticlj hwitsh wil mjc dhxt tasc yzy (but I already have a way of typing words in phonetically which will make that task easy). ajv byn ywzik it fqr menj yyrz (I've been using it for many years).