back to list

"Debugging" theory of what music is

🔗WarrenS <warren.wds@gmail.com>

10/2/2011 2:36:01 PM

Hello... I'm a mathematician.
I invented a hypothesis, combining elements of computer science, biology, psychology,
and music theory, about "what music is." That is, music seems evolutionarily senseless
on the face of it (why waste time banging sticks and blowing through pipes when you could be searching for some food?) But nevertheless there are mechanisms built into humans that seem to have evolved to enable us to be musical. Apparent contradiction.

So... the theory in a nutshell is: music is a "debugging tool" for the human brain & auditory
system. It's evolutionarily valuable. It would seem important to understand this theory (if it is true) if you want to be a good composer.

I wrote two manuscripts, one too long, and the other too short, describing the theory, comparing to other theories, and examining them in light of a ton of evidence, including "tuning mathematics" but also a lot of other things (over 200 evidence items). You can read the manuscripts online here:

"What is music" by Warren D. Smith:
http://dl.dropbox.com/u/3507527/MusicTh.html long version
http://dl.dropbox.com/u/3507527/MusicThShort.html short version

Perhaps this group will provide a lot of comments...

🔗genewardsmith <genewardsmith@sbcglobal.net>

10/2/2011 3:07:15 PM

--- In tuning-math@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:

> Perhaps this group will provide a lot of comments...

My first comment is that the main tuning list would be a better fit unless you are going to get into serious mathematics.

🔗Mike Battaglia <battaglia01@gmail.com>

10/2/2011 3:09:32 PM

On Sun, Oct 2, 2011 at 5:36 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> Perhaps this group will provide a lot of comments...

Hi Warren - this sounds really interesting! This is actually the first
theory I've heard from a first-time outsider posting on this list that
sounds like a good idea right off the bat. I'll be sure to check it
out.

For reference though, you might be better off posting this stuff to
tuning@yahoogroups.com - this list is generally more about math
specifically. You might also get a good audience at the Xenharmonic
Alliance group on Facebook, if you use that, where a lot of theorists
both past and present spend their time.

-Mike

🔗Keenan Pepper <keenanpepper@gmail.com>

10/4/2011 9:56:43 AM

--- In tuning-math@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:
>
> Hello... I'm a mathematician.
> I invented a hypothesis, combining elements of computer science, biology, psychology,
> and music theory, about "what music is." That is, music seems evolutionarily senseless
> on the face of it (why waste time banging sticks and blowing through pipes when you could be searching for some food?) But nevertheless there are mechanisms built into humans that seem to have evolved to enable us to be musical. Apparent contradiction.
>
> So... the theory in a nutshell is: music is a "debugging tool" for the human brain & auditory
> system. It's evolutionarily valuable. It would seem important to understand this theory (if it is true) if you want to be a good composer.
>
> I wrote two manuscripts, one too long, and the other too short, describing the theory, comparing to other theories, and examining them in light of a ton of evidence, including "tuning mathematics" but also a lot of other things (over 200 evidence items). You can read the manuscripts online here:
>
> "What is music" by Warren D. Smith:
> http://dl.dropbox.com/u/3507527/MusicTh.html long version
> http://dl.dropbox.com/u/3507527/MusicThShort.html short version
>
> Perhaps this group will provide a lot of comments...

I haven't had time to read the whole long version yet, but here are some brief comments:

First of all, if you don't know what "harmonic entropy" is, look it up immediately and read about it. You're gonna love it. We can quantify how clearly a given interval represents small-number ratios. You should also be delighted to know that the standard ("meantone" or "syntonic" tempered) pentatonic and diatonic scales are unique in minimizing average harmonic entropy over all scales of a certain general form. So this not only explains why we use 12 equal divisions of the octave (it's a very strong minimum of harmonic entropy among equal divisions), but also why we use the particular pentatonic and diatonic scales found in it. (Note that in many ways the scales are more fundamental than the 12edo system.)

Second, a reply to one specific part:

"The only rival hypothesis, suggested by Bob Fink, is that small-integer frequency ratios arose naturally as "overtones," for example of vibrating strings, and perhaps there is some evolutionary fitness value in being able to identify such things aurally. This may not be entirely false, but Fink appears not to have realized that this is only true for idealized 1D strings, whose vibrational modes are sine waves. Generic vibrating 3D and 2D objects and cavities do not exhibit integer frequency ratios."

The human voice (singing a vowel, for example) is also periodic, so its Fourier spectrum consists of perfectly harmonic overtones and nothing else. Since your theory is based in large part on the evolutionary advantage of the ability to understand speech (right?), this seems highly relevant. I'll quote from Carl Lumma's "Tuning FAQ":

" Q: Critical band interactions are strongly supported by
physiological and psychoacoustic evidence. What's the evidence
for an innate affinity towards simple rational intervals?

A: Sounds with inharmonic spectra do not evoke well-resolved
pitches, as do sounds with harmonic spectra. This "virtual
pitch" phenomemon has been studied in pyschoacoustics:
http://www.mmk.ei.tum.de/persons/ter/top/virtualp.html

As social animals, humans are highly adapted to extract
information from speech sounds. Human vocal folds produce rich
spectra with perfectly harmonic overtones, and vowels sounds in
all natural languages are defined by selectively boosting or
cutting regions of those spectra by using the vocal tract as a
resonant filter:
http://en.wikipedia.org/wiki/Formant

Thus, a hearing system that is able to identify harmonic spectra
as single sources and continuously characterize their spectral
balance over time has high adaptive significance for humans. It
is noted that Western tonal music produces spectra with similar
characteristics to that of speech."

Note that some confusion is possible because the resonances of the throat and mouth cavities are indeed inharmonic, but since in normal speech they're merely acting as filters for the periodic signal coming from the vocal folds, they only change the amplitudes of the various harmonic overtones (creating "formants"). The human voice is still a perfectly harmonic sound.

Last, a small error:

"To Rameau, two triad chords, the "major triad" [4,3] and its inversion the "minor triad" [3,4] – corresponding approximately to note-frequency ratios 6:8:9 and 4:5:6 respectively..."

6:8:9 is not a minor triad at all, but a suspended fourth chord ([5,2] in your notation). 6:7:9 could be a minor triad, but the most common version of the minor triad in just intonation is the mathematically correct inversion of 4:5:6, that is,

1/6:1/5:1/4 = 10:12:15

Very interesting paper!

Keenan

🔗WarrenS <warren.wds@gmail.com>

10/5/2011 4:43:22 PM

> > "What is music" by Warren D. Smith:
> > http://dl.dropbox.com/u/3507527/MusicTh.html long version
> > http://dl.dropbox.com/u/3507527/MusicThShort.html short version

> First of all, if you don't know what "harmonic entropy" is, look it up immediately and read about it.

--http://sethares.engr.wisc.edu/paperspdf/HarmonicEntropy.pdf
I presume? This "entropy" appears to be a nicer and more-elegant measure than the ad hoc one I was using in my tuning section, but both would accomplish the same purpose
for me. (I don't think this improvement, if it is an improvement, tremendously matters -- but, nice to know and I should cite it.)

>You're gonna love it. We can quantify how clearly a given interval represents small-number ratios. You should also be delighted to know that the standard ("meantone" or "syntonic" tempered) pentatonic and diatonic scales are unique in minimizing average harmonic entropy over all scales of a certain general form. So this not only explains why we use 12 equal divisions of the octave (it's a very strong minimum of harmonic entropy among equal divisions), but also why we use the particular pentatonic and diatonic scales found in it. (Note that in many ways the scales are more fundamental than the 12edo system.)

--ok, that sounds nice.

> Second, a reply to one specific part:
>
> "The only rival hypothesis, suggested by Bob Fink, is that small-integer frequency ratios arose naturally as "overtones," for example of vibrating strings, and perhaps there is some evolutionary fitness value in being able to identify such things aurally. This may not be entirely false, but Fink appears not to have realized that this is only true for idealized 1D strings, whose vibrational modes are sine waves. Generic vibrating 3D and 2D objects and cavities do not exhibit integer frequency ratios."
>
> The human voice (singing a vowel, for example) is also periodic, so its Fourier spectrum consists of perfectly harmonic overtones and nothing else.

--uh, not sure what that meant. Every L2 signal is fourier decomposable, and every linear elastic continuum system has sinewave time dependence for eigenmodes...
so if you're saying the human voice does too, that's a tautology.
And certainly the human voice does not only consist of integer frequency ratios, although
I suppose talented singers can make that more true.

>Since your theory is based in large part on the evolutionary advantage of the ability to understand speech (right?), this seems highly relevant. I'll quote from Carl Lumma's "Tuning FAQ":

--- http://lumma.org/tuning/faq/
I presume?

> " Q: Critical band interactions are strongly supported by
> physiological and psychoacoustic evidence. What's the evidence
> for an innate affinity towards simple rational intervals?
>
> A: Sounds with inharmonic spectra do not evoke well-resolved
> pitches, as do sounds with harmonic spectra.

--what's an "inharmonic spectrum" and what's a "harmonic spectrum"?
That faq does not define these.

> This "virtual
> pitch" phenomemon has been studied in pyschoacoustics:
> http://www.mmk.ei.tum.de/persons/ter/top/virtualp.html
>
> As social animals, humans are highly adapted to extract
> information from speech sounds. Human vocal folds produce rich
> spectra with perfectly harmonic overtones, and vowels sounds in
> all natural languages are defined by selectively boosting or
> cutting regions of those spectra by using the vocal tract as a
> resonant filter:
> http://en.wikipedia.org/wiki/Formant
>
> Thus, a hearing system that is able to identify harmonic spectra
> as single sources and continuously characterize their spectral
> balance over time has high adaptive significance for humans. It
> is noted that Western tonal music produces spectra with similar
> characteristics to that of speech."

--While I was saying something similar to that in spirit,
the precise reasoning in the above quote seems to me rather muddled
and bogus. I think the value of being able to recognize integer frequency ratios is not that the human voice has them; it is that if you can do this, then you've got an excellent
frequency-calibration tool that you can use to help keep your audio-reception working & calibrated well. A competing animal without integer ratio recognition would
find their audio reception getting more "out of tune" and "losing calibration" and hence would be a less effective linguist.

> Note that some confusion is possible because the resonances of the throat and mouth cavities are indeed inharmonic, but since in normal speech they're merely acting as filters for the periodic signal coming from the vocal folds, they only change the amplitudes of the various harmonic overtones (creating "formants"). The human voice is still a perfectly harmonic sound.
>
> Last, a small error:

> "To Rameau, two triad chords, the "major triad" [4,3] and its inversion the "minor triad" [3,4] – corresponding approximately to note-frequency ratios 6:8:9 and 4:5:6 respectively..."
>
> 6:8:9 is not a minor triad at all, but a suspended fourth chord ([5,2] in your notation). 6:7:9 could be a minor triad, but the most common version of the minor triad in just intonation is the mathematically correct inversion of 4:5:6, that is,
>
> 1/6:1/5:1/4 = 10:12:15
>
> Very interesting paper!
> Keenan

--thanks... I appear to have screwed that bit up somehow
(doesn't really matter, but I need to fix it)
wikipedia
http://en.wikipedia.org/wiki/Major_chord
says freq 4:5:6 is a major triad in just intonation and is [4,7] in semitones notation;
and
http://en.wikipedia.org/wiki/Minor_chord
says [3,7] in semitones is 10:12:15 freq ratio is minor triad.
will try to repair later.

incidentally it seems dropbox has changed their software so that the pictures
in my long manuscript are now invisible(?). Rats. I'm trying to make a personal
website elsewhere and when I get that done the manuscript will move
there and the pictures will be restored. Sorry about that.

Anyhow, the whole tuning area is only one part of my paper. I just found out about
this here yahoogroup and I'm not sure what to make of it. I'm actually somewhat amazed it even exists. Is there some "introduction"
allowing the newcomer to find out more about what is in this yahoo group and
how it all fits together?

🔗WarrenS <warren.wds@gmail.com>

10/5/2011 4:50:08 PM

> > > "What is music" by Warren D. Smith:
> > > http://dl.dropbox.com/u/3507527/MusicTh.html long version
> > > http://dl.dropbox.com/u/3507527/MusicThShort.html short version

--dropbox appears(?) to have repaired their software so pictures in my manuscript
are once again visible.

🔗Keenan Pepper <keenanpepper@gmail.com>

10/5/2011 7:49:23 PM

--- In tuning-math@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:
> > The human voice (singing a vowel, for example) is also periodic, so its Fourier spectrum consists of perfectly harmonic overtones and nothing else.
>
> --uh, not sure what that meant. Every L2 signal is fourier decomposable, and every linear elastic continuum system has sinewave time dependence for eigenmodes...
> so if you're saying the human voice does too, that's a tautology.
> And certainly the human voice does not only consist of integer frequency ratios, although
> I suppose talented singers can make that more true.

Really? You really don't know what I'm talking about?

The sound of a vowel being sung at constant pitch is approximately periodic, *as a whole*, which means all the different Fourier components are locked in a definite phase relationship. The frequencies are all commensurate. This is quite different from the situation with a "linear elastic continuum" (which the voice is not), in which Fourier component is periodic by definition, but the sum of all the components is not periodic because the frequencies are incommensurate and the phases are not locked.

Being a "talented singer" has absolutely nothing to do with it. Anyone capable of singing a constant pitch without drifting significantly can demonstrate this with anything that can show you a Fourier spectrum.

I made my own demonstration of this just to show how easy it is to do and how obvious the results are.

First, I struck a glass bowl and recorded the bell-like sound it produced. The waveform,

http://f1.grp.yahoofs.com/v1/oAuNTlgVs8reAzfTATy3qNNUgmt7jEvffkTuH6SjkEN8s9ANv4nMKMptJNwa75JzfbeXuS2Cf6A_opJRfa8q_N-MdNrSFnEnK5yh/Keenan%20Pepper/bowl-waveform.png

has evidence of periodicities in it but the waveform as a whole is not periodic. The spectrum of this sound,

http://f1.grp.yahoofs.com/v1/oAuNTuQNm-neAzfTbZB0fTJM41T20Fh_VPR36d8fPqeoqpihTEyr92sjP5_eCh4POz7d0SiL80DTIfg946gswEdpKBGiSAJ7YBRk/Keenan%20Pepper/bowl.pdf

shows very clear discrete peaks, and the frequencies of the first four are approximately

336, 842, 1490, 2268 Hz

or relative to the lowest frequency,

f, 2.51 f, 4.43 f, 6.75 f

These are clearly not simple integer ratios, which is consistent with the observation that the waveform is not nearly periodic.

Next, I recorded myself singing a vowel (something like the "a" in "cat"). I am by no means a talented singer, but I am able to hold the pitch steady (within 10 or 20 cents I imagine). The waveform,

http://f1.grp.yahoofs.com/v1/oAuNThf5K27eAzfTSg7ywZ0PihmUy38yUiwPyEfQZ2--EriEDmZfwKdXnc5WVnX7npoSLs4RXpTlrSMf9xihsf1HnxOG-XDXD_Vn/Keenan%20Pepper/voice-waveform.png

is quite different from that of the bowl because in this case there is a single obvious period for the waveform *as a whole*. Although different periods are not identical, they resemble each other very strongly. Nothing appears to be varying with an incommensurate frequency.

The spectrum of this vowel sound,

http://f1.grp.yahoofs.com/v1/oAuNTh4zypveAzfTFv8_lMPQwQH3IyhnB-u_alCljqQmX2ePeJuNuCjyu8TSJTwFmPxLiDT2gRCUBYxDTD737XmPPPiGE3t52Bxg/Keenan%20Pepper/voice.pdf

is very obviously different from that of the bowl, because all of the important peaks line up *perfectly* at integer multiples of the fundamental frequency. The frequencies of the first 10 peaks are:

162, 324, 487, 649, 811, 974, 1136, 1299, 1462, 1625

which are *indistinguishable* (given the 2-3 Hz spectral resolution) from harmonics f, 2f, 3f... 10f.

This is what I'm talking about. The sound of a "linear elastic continuum", or idiophone, is aperiodic, so its spectrum consists of incommensurate frequencies corresponding to its eigenmodes. The sound of the human voice, in contrast, is effectively periodic, and its spectrum consists of practically exact integer harmonics. If I were able to take a huge breath and sing an absolutely steady note for 100 seconds, you could verify that the harmonics were exact integer multiples to within the 0.01 Hz precision that affords you. They're not integer multiples because of any kind of fine-tuning; they're integer multiples because the physical mechanism of sound production forces them to be locked in phase.

Now, of course, in normal speech nobody holds a vowel out at a constant pitch like that; the pitch of the whole thing is modulated. But the phase locking phenomenon is still present. If the fundamental changes by a certain interval, all the harmonics change by exactly that same interval and remain integer harmonics.

> >Since your theory is based in large part on the evolutionary advantage of the ability to understand speech (right?), this seems highly relevant. I'll quote from Carl Lumma's "Tuning FAQ":
>
> --- http://lumma.org/tuning/faq/
> I presume?

Yes.

> --While I was saying something similar to that in spirit,
> the precise reasoning in the above quote seems to me rather muddled
> and bogus. I think the value of being able to recognize integer frequency ratios is not that the human voice has them; it is that if you can do this, then you've got an excellent
> frequency-calibration tool that you can use to help keep your audio-reception working & calibrated well. A competing animal without integer ratio recognition would
> find their audio reception getting more "out of tune" and "losing calibration" and hence would be a less effective linguist.

I think if you take the time to absorb what I said above and admit that the voice is fundamentally a periodic, harmonic sound, you'll re-evaluate Carl's FAQ entry (which I think is crystal clear and insightful).

> Anyhow, the whole tuning area is only one part of my paper. I just found out about
> this here yahoogroup and I'm not sure what to make of it. I'm actually somewhat amazed it even exists. Is there some "introduction"
> allowing the newcomer to find out more about what is in this yahoo group and
> how it all fits together?

We're constantly working on things like this. For an introduction to the theories we've been developing and xenharmonic music in general, try http://xenharmonic.wikispaces.com/

As for the "tuning-math" group in particular, I believe it was only created in the first place because some people in the main tuning group objected to all the discussion of "exterior algebra", "homomorphisms of free abelian groups", "wedgies", "mapping matrices", and other math topics they (justifiably) thought were hardly related to music making. The "MakeMicroMusic" group is even more focused on practice and compositions rather than theory.

Your ideas are definitely of general interest, so any new discussions should be started on the main tuning group.

Keenan

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 7:16:26 AM

For some reason the links I posted aren't working. But you can just go to my folder,

/tuning-math/files/Keenan%20Pepper/

and look at the following files:

bowl-waveform.png
bowl.pdf
voice-waveform.png
voice.pdf

Keenan

🔗WarrenS <warren.wds@gmail.com>

10/6/2011 7:17:52 AM

> Really? You really don't know what I'm talking about?
>
> The sound of a vowel being sung at constant pitch is approximately periodic, *as a whole*, which means all the different Fourier components are locked in a definite phase relationship. The frequencies are all commensurate. This is quite different from the situation with a "linear elastic continuum" (which the voice is not), in which Fourier component is periodic by definition, but the sum of all the components is not periodic because the frequencies are incommensurate and the phases are not locked.
>
> Being a "talented singer" has absolutely nothing to do with it. Anyone capable of singing a constant pitch without drifting significantly can demonstrate this with anything that can show you a Fourier spectrum.
>
> I made my own demonstration of this just to show how easy it is to do and how obvious the results are.
>
> First, I struck a glass bowl and recorded the bell-like sound it produced. The waveform,
>
> http://f1.grp.yahoofs.com/v1/oAuNTlgVs8reAzfTATy3qNNUgmt7jEvffkTuH6SjkEN8s9ANv4nMKMptJNwa75JzfbeXuS2Cf6A_opJRfa8q_N-MdNrSFnEnK5yh/Keenan%20Pepper/bowl-waveform.png

--all these URLs you are giving with the waveforms and spectra -- my browser says "file not found" for them all, unfortunately. I'd love to see them, perhaps you
could post them in the FILES section of this yahoogroup?
I might even want to steal them (with credit) for use in my paper
(if you permit?).

> has evidence of periodicities in it but the waveform as a whole is not periodic. The spectrum of this sound,
>
> http://f1.grp.yahoofs.com/v1/oAuNTuQNm-neAzfTbZB0fTJM41T20Fh_VPR36d8fPqeoqpihTEyr92sjP5_eCh4POz7d0SiL80DTIfg946gswEdpKBGiSAJ7YBRk/Keenan%20Pepper/bowl.pdf
>
> shows very clear discrete peaks, and the frequencies of the first four are approximately
>
> 336, 842, 1490, 2268 Hz
>
> or relative to the lowest frequency,
>
> f, 2.51 f, 4.43 f, 6.75 f
>
> These are clearly not simple integer ratios, which is consistent with the observation that the waveform is not nearly periodic.
>
> Next, I recorded myself singing a vowel (something like the "a" in "cat"). I am by no means a talented singer, but I am able to hold the pitch steady (within 10 or 20 cents I imagine). The waveform,
>
> http://f1.grp.yahoofs.com/v1/oAuNThf5K27eAzfTSg7ywZ0PihmUy38yUiwPyEfQZ2--EriEDmZfwKdXnc5WVnX7npoSLs4RXpTlrSMf9xihsf1HnxOG-XDXD_Vn/Keenan%20Pepper/voice-waveform.png
>
> is quite different from that of the bowl because in this case there is a single obvious period for the waveform *as a whole*. Although different periods are not identical, they resemble each other very strongly. Nothing appears to be varying with an incommensurate frequency.
>
> The spectrum of this vowel sound,
>
> http://f1.grp.yahoofs.com/v1/oAuNTh4zypveAzfTFv8_lMPQwQH3IyhnB-u_alCljqQmX2ePeJuNuCjyu8TSJTwFmPxLiDT2gRCUBYxDTD737XmPPPiGE3t52Bxg/Keenan%20Pepper/voice.pdf
>
> is very obviously different from that of the bowl, because all of the important peaks line up *perfectly* at integer multiples of the fundamental frequency. The frequencies of the first 10 peaks are:
>
> 162, 324, 487, 649, 811, 974, 1136, 1299, 1462, 1625
>
> which are *indistinguishable* (given the 2-3 Hz spectral resolution) from harmonics f, 2f, 3f... 10f.

--fascinating. I had no idea this was the case. Why in hell IS it the case?
I mean, how does one's voice manage to do that, physically? It would be explained if
your vocal chords were in fact, stretched chords like guitar strings, but actually
"vocal chords" are kind of fleshy flaps, not chords at all, so I see no physical reason this ought to be true. It also would be explained if you were setting up resonant 1-dimensional modes in the air inside a long organ pipe, but again, the human vocal cavity is not at all
a long 1D pipe. Perhaps, then, the only possible explanation is some kind of active neuro-muscular fedback control system keeping everything exactly periodic? But this only seems to happen when singing a vowel without change -- it is not the case in normal song-singing and speaking, right?

The fact that humans have this additional vocal capability (which I had not known about) I guess has to be regarded as further evidence for my whole "debugging theory" since
why else would it be there?

Hmm, I guess this gives a mathematical meaning to the (previously rather undefined) term "vowel"???!!!
[Which is rather weird since then the definition of "vowel" would basically be "it is consonant" i.e. exactly misnamed!]

If so, then this whole capability is useful for speech recognition?
That you might say, UNDERMINES my theory since it offers an alternative explanation
(although there's no reason the two explanations couldn't co-exist & synergise).
To verify this, you'd really have to look at a few hundred waveforms for different speech (consonant & vowel & combination phoneme) sounds, several hundred I'd think.

Here's a random paper about phonemes and waveforms which offers absolutely no
recognition of the whole "vowels are exactly periodic" hypothesis:
http://eprints.pascal-network.org/archive/00005169/01/final.pdf
I've read quite a lot of of papers from the machine speech recognition community and all
the ones I noticed so far offered no recognition of this whole idea, although they do not specifically refute it either.

Wikipedia
http://en.wikipedia.org/wiki/Phoneme
claims different languages have different numbers of vowels and consonants, and claimed
the Bantu language Ngwe has a (high) "total of 38 vowels"
and wikipedia also says the Ubykh language has about 80 consonants.

Wikipedia "Vowel" says
"In phonetics, a vowel is a sound... pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, where there is a [partial or complete] constriction or closure at some point along the vocal tract."

And what if you are just speaking a vowel normally, rather than singing it with intentional effort to keep it constant pitch?

And what if you sing some consonants?

> This is what I'm talking about. The sound of a "linear elastic continuum", or idiophone, is aperiodic, so its spectrum consists of incommensurate frequencies corresponding to its eigenmodes. The sound of the human voice, in contrast, is effectively periodic, and its spectrum consists of practically exact integer harmonics. If I were able to take a huge breath and sing an absolutely steady note for 100 seconds, you could verify that the harmonics were exact integer multiples to within the 0.01 Hz precision that affords you. They're not integer multiples because of any kind of fine-tuning; they're integer multiples because the physical mechanism of sound production forces them to be locked in phase.
>
> Now, of course, in normal speech nobody holds a vowel out at a constant pitch like that; the pitch of the whole thing is modulated. But the phase locking phenomenon is still present. If the fundamental changes by a certain interval, all the harmonics change by exactly that same interval and remain integer harmonics.
>
> > >Since your theory is based in large part on the evolutionary advantage of the ability to understand speech (right?), this seems highly relevant. I'll quote from Carl Lumma's "Tuning FAQ":
> >
> > --- http://lumma.org/tuning/faq/
> > I presume?
>
> Yes.
>
> > --While I was saying something similar to that in spirit,
> > the precise reasoning in the above quote seems to me rather muddled
> > and bogus. I think the value of being able to recognize integer frequency ratios is not that the human voice has them; it is that if you can do this, then you've got an excellent
> > frequency-calibration tool that you can use to help keep your audio-reception working & calibrated well. A competing animal without integer ratio recognition would
> > find their audio reception getting more "out of tune" and "losing calibration" and hence would be a less effective linguist.
>
> I think if you take the time to absorb what I said above and admit that the voice is fundamentally a periodic, harmonic sound, you'll re-evaluate Carl's FAQ entry (which I think is crystal clear and insightful).

--well, you are making an impressive point, though it'd be more impressive
if I could actually see those pictures...

> > Anyhow, the whole tuning area is only one part of my paper. I just found out about
> > this here yahoogroup and I'm not sure what to make of it. I'm actually somewhat amazed it even exists. Is there some "introduction"
> > allowing the newcomer to find out more about what is in this yahoo group and
> > how it all fits together?
>
> We're constantly working on things like this. For an introduction to the theories we've been developing and xenharmonic music in general, try http://xenharmonic.wikispaces.com/

--difficult for me to comprehend most of that.
This bibliography was about the only thing saw there that hit the spot:
http://www.bikexprt.com/music/tunebibl.htm
I'm definitely going to look at that stuff!

> As for the "tuning-math" group in particular, I believe it was only created in the first place because some people in the main tuning group objected to all the discussion of "exterior algebra", "homomorphisms of free abelian groups", "wedgies", "mapping matrices", and other math topics they (justifiably) thought were hardly related to music making. The "MakeMicroMusic" group is even more focused on practice and compositions rather than theory.
>
> Your ideas are definitely of general interest, so any new discussions should be started on the main tuning group.
>
> Keenan

--well, I *am* a mathematician... and I'm actually almost worthless as a musician
and there's plenty about biology I do not know either. My paper may make it sound
like I know a lot, which is because I read a ton of music books and converted
what parts I could into mathematical language (since the language they used struck me as just horribly disgusting in comparison). For a mathematician trying to learn about music theory I'd definitely recommend my paper as a starting point in preference to all the music-theory books I ever saw. (Trying to talk that way, for a mathematician, is kind of like a trip to hell. And some music theories really seem to me to have no real mathematical content/meaning at all, it may seem that way naively but it's really just hogwash.) But that creates the illusion that I'm really a music expert, which I'm not, or if I am it's in a very unusual sense.

So I'll stick to this yahoogroup for the moment.

🔗Graham Breed <gbreed@gmail.com>

10/6/2011 8:09:19 AM

"WarrenS" <warren.wds@gmail.com> wrote:

> Hmm, I guess this gives a mathematical meaning to the
> (previously rather undefined) term "vowel"???!!! [Which
> is rather weird since then the definition of "vowel"
> would basically be "it is consonant" i.e. exactly
> misnamed!]

AIUI, vowels are harmonic sounds with no noise, voiced
consonants are a mixture of a harmonic sound and noise, and
voiceless consonants are all noise. Given that there's
always some noise and the pitch is never exactly
consonant, you never get an exactly periodic sound. Trained
singers will generally make vowel sounds with less noise
and less random pitch fluctuations.

I discovered last week that my local library gives me
access to Oxford Music Online, and therefore articles in
the New Grove. Maybe your library can get you that as
well, if you talk to them. I know some people think
libraries should all have access to JSTOR, but we can't
afford that round here. The Oxford package must be cheaper.

So, there's a big article called Acoustics, §VI: The
voice. It includes:

"The sound generated by the chopped transglottal airstream
is built up by a great number of harmonic partials whose
amplitudes generally decrease monotonically with frequency,
roughly by 12 dB per octave at neutral loudness."

This is a common feature of animal sounds and obviously
explains why we're so good at recognizing harmonic
timbres.

Graham

🔗Carl Lumma <carl@lumma.org>

10/6/2011 8:59:27 AM

At 07:16 AM 10/6/2011, you wrote:
>For some reason the links I posted aren't working.

Those are temporary links. Links of the form

/tuning-math/files/KeenanPepper/voice-waveform.png

will work. -Carl

🔗WarrenS <warren.wds@gmail.com>

10/6/2011 10:24:58 AM

Looking in the book
Robert D. Rodman: Computer speech technology, Artech 1999,
(which is not a bad book for our purposes...)
table 1.9 page 38 gives the
"formant frequencies of 12 english monophthongal vowels"
in Hz spoken by a particular
"large male from Boston, LA, and North Carolina"
in one recording session which had multiple samples for each:

vowel....F1.....F2.....F3
e-bar....250...1950...2620
i........360...1720...2330
a-bar....480...1710...2300
e........490...1590...2240
a-vmark..620...1500...2200
a........540...1300...2500
u-vmark..510...1410...2050
o........660...1100...2150
o-^mark..540....920...2250
o........490...1010...2300
u........420...1050...2200
u........300....850...2250
The final u apparently was intended to be u-bar
and was misprinted.

Anyhow, these often are not nice small integer ratios,
-- so I conclude that at least for SPOKEN
vowels, the integer frequency ratio thing does not hold.

Just the 3 formant freqs (and often just F1 and F2)
enable producing comprehensible vowel sounds. In fact during the years 1770-1900 various people built resonators with interesting weird internal shapes (devised by trial and error), such that if you blew air into the "A" resonator, it would produce the vowel sound "A" and so on (they had resonators for each of a large number of vowel sounds).

C.G.von Kratzenstein's resonators for A,E,I,O,U German sounds are pictured schematically fig 2.4 page 27 of Manfred Schroeder, Computer Speech, Springer (2nd ed) 2004. They won a prize of the Russian Imperial Academy of Sciences in 1779.

I do not think these 3 frequencies need be particularly sharp, there are some spectrograms where they are blobs not sharp peaks spectrally.

There is some discussion of singing in these books, and the claim is that mostly in opera singing, what you are hearing is vowels. Apparently you just cannot sing sustained consonants (or at most
very few of them?).

A diagram of the human vocal tract is fig 1.2 page 7 of
Rodman, and it is one heck of a weird shape,
not at all a simple long organ pipe shape, and of course
you alter the shape quite a lot during speaking -- it's
vaguely like a modern trumpet where you push the buttons which open
and close paths to different pipe-segments; you can move muscles
in your vocal tract which do similar stuff.
I would definitely not intuitively expect any shape this weird to produce resonant frequencies in nice simple integer ratios,
but my intuition might be wrong.

So it would seem puzzling how human singers can do it
(if and when they do it) -- but I suppose one ought to be able to use muscles to adjust the geometry just right to get one or two desired integer ratios, and then you can remember that configuration and re-use it.

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 10:25:19 AM

--- In tuning-math@yahoogroups.com, Carl Lumma <carl@...> wrote:
>
> At 07:16 AM 10/6/2011, you wrote:
> >For some reason the links I posted aren't working.
>
> Those are temporary links. Links of the form
>
> /tuning-math/files/KeenanPepper/voice-waveform.png
>
> will work. -Carl

OK, thanks. So we have

/tuning-math/files/KeenanPepper/bowl-waveform.png
/tuning-math/files/KeenanPepper/bowl.pdf
/tuning-math/files/KeenanPepper/voice-waveform.png
/tuning-math/files/KeenanPepper/voice.png

Keenan

🔗Mike Battaglia <battaglia01@gmail.com>

10/6/2011 10:37:21 AM

On Thu, Oct 6, 2011 at 1:24 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> Anyhow, these often are not nice small integer ratios,
> -- so I conclude that at least for SPOKEN
> vowels, the integer frequency ratio thing does not hold.

No, that's not true. The formants refer to the filtering
characteristics of the vocal tract. The glottal waveform is filtered
by these formants. If you're a mathematician, then precisely what's
happening is that the total output waveform is equal to the glottal
waveform, convolved with the impulse response of the vocal tract. In
the frequency domain, this equates to a pointwise multiplication of
the glottal waveform and the frequency response of the vocal tract,
which will peak at the formants listed. Since the frequency response
of the glottal waveform is 0 everywhere except for integer harmonics
of the fundamental, the frequency response of the output will also be
0 except for integer harmonics of the fundamental.

> So it would seem puzzling how human singers can do it
> (if and when they do it) -- but I suppose one ought to be able to use muscles to adjust the geometry just right to get one or two desired integer ratios, and then you can remember that configuration and re-use it.

What singers get good at doing is "shading" the vowels that they sing,
moving vowels like "ay" in the "eh" direction, for example, so that
the resulting formant shift coincides with one of the harmonics of the
glottal waveform.

-Mike

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 11:46:02 AM

--- In tuning-math@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:
> --all these URLs you are giving with the waveforms and spectra -- my browser says "file not found" for them all, unfortunately. I'd love to see them, perhaps you
> could post them in the FILES section of this yahoogroup?
> I might even want to steal them (with credit) for use in my paper
> (if you permit?).

Sure, you can use them for whatever you like. I can even supply different formats for publication-quality images.

I just posted the permanent links to the images, but here they are again:

/tuning-math/files/KeenanPepper/bowl-waveform.png
/tuning-math/files/KeenanPepper/bowl.pdf
/tuning-math/files/KeenanPepper/voice-waveform.png
/tuning-math/files/KeenanPepper/voice.pdf

> --fascinating. I had no idea this was the case. Why in hell IS it the case?
> I mean, how does one's voice manage to do that, physically? It would be explained if
> your vocal chords were in fact, stretched chords like guitar strings, but actually
> "vocal chords" are kind of fleshy flaps, not chords at all, so I see no physical reason this ought to be true. It also would be explained if you were setting up resonant 1-dimensional modes in the air inside a long organ pipe, but again, the human vocal cavity is not at all
> a long 1D pipe. Perhaps, then, the only possible explanation is some kind of active neuro-muscular fedback control system keeping everything exactly periodic? But this only seems to happen when singing a vowel without change -- it is not the case in normal song-singing and speaking, right?

You're making the time-honored mathematician's/physicist's mistake of assuming that everything is approximately linear. =)

You're absolutely correct that the linear eigenmodes of the throat and mouth cavities have incommensurate frequencies. So if you're *whispering* vowels, then you don't get periodic waveforms at all; you get some noise that's filtered to create peaks at those incommensurate frequencies, and no harmonic overtones at all. This is a similar situation to someone blowing into a flute (or ocarina) and creating noise, but not actually *playing* the flute.

However, if you're creating the original sound, not by a linear system that produces a sinusoid at each eigenfrequency, but by a single very nonlinear oscillator, you get a signal that is almost exactly periodic, but not even approximately sinusoidal. This kind of signal automatically contains harmonic overtones.

For example, in the electronics lab it's easy to create such a signal with a simple circuit called a "relaxation oscillator". This is actually much easier to build than a sine-wave oscillator because you just need something that switches back and forth between two voltages when its input signal crosses some threshold (e.g. an op-amp). Then you just hook it up to a capacitor in such a way that as soon as the capacitor charges up or down past the threshold, the output signal switches between low and high. This creates an approximate square wave.

Note that, because it's nonlinear, all the different harmonics are created simultaneously by the same process. There is no need to *tune* their frequencies to be harmonic overtones; they just *are*.

In the same way, there are many physical nonlinear oscillators that are used as musical instruments, and these automatically produce periodic but non-sinusoidal signals, that is, sounds with harmonic overtones. Examples include:

* Reed instruments (the reed moves back and forth but its motion is highly nonlinear, slamming into something and bouncing off rather than moving like a pendulum)

* Brass instruments (the same thing only with the player's lips)

* Bowed string instruments (This case is complicated, because if you *pluck* a string then it acts pretty much linearly and you only get approximately harmonic overtones because the string's eigenmodes are approximately harmonic. But when you bow a string you're causing a nonlinear autoresonance phenomenon that acts like a periodic driving force. I believe that if you replaced a violin's strings with strings of varying density, so that their natural overtones are inharmonic, not only would the violin sound worse, it would actually become *much more difficult to physically play* because the overtones no longer reinforce the autoresonance. Someone should try this experiment!)

So, in conclusion, there's no reason to believe anything complicated like "active neuro-muscular fedback control" is at work. The human vocal folds are simply a nonlinear oscillator, much like a reed in a reed organ, and the spectrum of the sound they produce has harmonic overtones simply because it's periodic but not sinusoidal.

A plucked string is approximately periodic because it has approximately harmonic overtones.

The voice has *exactly* harmonic overtones (exactly phase-locked I mean), because it is periodic.

> The fact that humans have this additional vocal capability (which I had not known about) I guess has to be regarded as further evidence for my whole "debugging theory" since
> why else would it be there?

I'm sure that other animals create sounds with harmonic overtones too. In fact I'm also sure they're sometimes created naturally with no help from living things, but I can't think of an example off the top of my head.

> Hmm, I guess this gives a mathematical meaning to the (previously rather undefined) term "vowel"???!!!
> [Which is rather weird since then the definition of "vowel" would basically be "it is consonant" i.e. exactly misnamed!]

Haha!

> If so, then this whole capability is useful for speech recognition?
> That you might say, UNDERMINES my theory since it offers an alternative explanation
> (although there's no reason the two explanations couldn't co-exist & synergise).
> To verify this, you'd really have to look at a few hundred waveforms for different speech (consonant & vowel & combination phoneme) sounds, several hundred I'd think.

Graham Breed is spot-on in his characterization of vowels as purely harmonic sounds, voiced consonants as harmonic + noise, and voiceless consonants as pure noise.

> Here's a random paper about phonemes and waveforms which offers absolutely no
> recognition of the whole "vowels are exactly periodic" hypothesis:
> http://eprints.pascal-network.org/archive/00005169/01/final.pdf
> I've read quite a lot of of papers from the machine speech recognition community and all
> the ones I noticed so far offered no recognition of this whole idea, although they do not specifically refute it either.

Interesting. I would have thought anyone who studied speech sounds in detail would be aware of this basic fact.

> And what if you are just speaking a vowel normally, rather than singing it with intentional effort to keep it constant pitch?

Then the period of this "periodic" signal is changing rapidly.

However, it's still a single oscillator that's the origin of all the sound, so as I said, all the harmonics are going to be locked in phase with each other at all times. There's nothing "free-wheeling" at a different, incommensurate frequency.

> And what if you sing some consonants?

Depends what the consonant is.

If it's a voiceless consonant, like "f" or "s", the vocal folds aren't involved at all, so you wouldn't even be "singing" it but "whispering".

If it's a plosive consonant like "b" or "d", it's impossible to sing it sustained because there's no way for air to escape. The pressure just builds up.

If it's a nasal consonant like "m" or "n", the spectrum looks similar to a vowel, with perhaps a larger "noise floor". The peaks are still perfectly harmonic though.

Fricative consonants like "v" or "z" also have harmonic overtones, but here the noise component is very prominant.

> --difficult for me to comprehend most of that.
> This bibliography was about the only thing saw there that hit the spot:
> http://www.bikexprt.com/music/tunebibl.htm
> I'm definitely going to look at that stuff!

Yes, good idea. Hemlholtz is good, Partch is good; I've heard that Blackwood is good but haven't read it personally.

I and Mike Battaglia plan to write a short introduction to xenharmonic tunings and regular temperament theory sometime.

> --well, I *am* a mathematician... and I'm actually almost worthless as a musician
> and there's plenty about biology I do not know either. My paper may make it sound
> like I know a lot, which is because I read a ton of music books and converted
> what parts I could into mathematical language (since the language they used struck me as just horribly disgusting in comparison). For a mathematician trying to learn about music theory I'd definitely recommend my paper as a starting point in preference to all the music-theory books I ever saw. (Trying to talk that way, for a mathematician, is kind of like a trip to hell. And some music theories really seem to me to have no real mathematical content/meaning at all, it may seem that way naively but it's really just hogwash.) But that creates the illusion that I'm really a music expert, which I'm not, or if I am it's in a very unusual sense.

In that case you're going to LOVE it here. I can tell that you and Gene Ward Smith are going to get along fabulously.

Keenan

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 11:50:22 AM

--- In tuning-math@yahoogroups.com, "Keenan Pepper" <keenanpepper@...> wrote:
> > This bibliography was about the only thing saw there that hit the spot:
> > http://www.bikexprt.com/music/tunebibl.htm
> > I'm definitely going to look at that stuff!
>
> Yes, good idea. Hemlholtz is good, Partch is good; I've heard that Blackwood is good but haven't read it personally.

Another book I strongly recommend is Bill Sethares's "Tuning, Timbre, Spectrum, Scale". Very well written and talks about everything we've just been discussing, with audio examples.

I personally disagree with some of his main conclusions, mostly because of the ideas we've just been talking about, that voice is harmonic and humans evolved to understand harmonic sounds, but it's still a great book.

Keenan

🔗WarrenS <warren.wds@gmail.com>

10/6/2011 12:33:53 PM

John R Pierce: Science of musical sound,
Scient. American books 1983,
pages 176-177
says formants are spectral blobs, corresponding to the 3 most important resonances of the vocal tract, causing voice to have greater intensity in certain regions of spectrum due to excitations by vocal cords at numerous frequencies but only the frequencies within the formant ranges become loud. Gives picture.
The formants in his picture look similar to the classical Lorentzian curve of intensity vs frequency for a driven damped harmonic oscillator.

Then Pierce gives this table without any cite or explanation
Vowel....F1....F2.....F3
Heed....270...2290...3010
Hid.....390...1990...2550
Head....530...1840...2480
Had.....660...1720...2410
Hod.....730...1090...2440
Hawed...570....840...2410
Hood....440...1020...2240
Who'd...300....870...2240
Hud.....640...1190...2390
Heard...490...1350...1690

Also Pierce claims that singing and speaking are pretty different; there is a "singer's formant" not present during speaking.

He claims singers and Buddhist chanters have precise control of their formant frequencies and that is what makes them sound good.

🔗Mike Battaglia <battaglia01@gmail.com>

10/6/2011 12:48:54 PM

On Thu, Oct 6, 2011 at 3:33 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> John R Pierce: Science of musical sound,
> Scient. American books 1983,
> pages 176-177
> says formants are spectral blobs, corresponding to the 3 most important resonances of the vocal tract, causing voice to have greater intensity in certain regions of spectrum due to excitations by vocal cords at numerous frequencies but only the frequencies within the formant ranges become loud. Gives picture.
> The formants in his picture look similar to the classical Lorentzian curve of intensity vs frequency for a driven damped harmonic oscillator.

Right. They're the three resonances of the vocal tract, but that
doesn't mean that those frequencies are present in the output signal.

Think about a Helmholtz resonator - an empty beer bottle. If you blow
across the top of this bottle, you'll hear a certain frequency
resonate. Now if you load up a tone generator and play a sine wave at
that frequency out of a speaker, very loudly, next to the bottle,
you'll feel the bottle vibrate in sympathy. However, if you load up a
tone generator and play a sine wave at some other pitch, you won't
feel the bottle vibrate. The bottle is acting as one of the
"formants," for example.

Now consider that if you play a sine wave corresponding to the
resonance into that bottle, the bottle won't resonate forever - it'll
stop vibrating an imperceptibly short amount of time after you stop
exciting it. One intuitive way to see why this is is to smack the top
of it, producing a rapidly decaying, but pitched "thwock" sound,
what's at times been called a "plucked bottle" sound. Assuming that
your hand is infinitely small and hard, and you only make contact with
the bottle for an infinitesimal amount of time, this will produce the
impulse response of the bottle, which will both look and sound like a
damped sinusoid.

This can be thought of as a sinusoid times some sort of dampening
envelope. Multiplication in the time domain is convolution in the
frequency domain, and hence the spectrum of the output waveform will
look like the convolution of two Dirac delta functions, symmetrically
placed about the y-axis, and whatever the frequency response of the
dampening envelope is.

The formants of the vocal tract, which originate in resonant cavities
in the chest and head, act analogously. The fact that you don't
continue to resonate forever when you talk, which I assume was an
evolutionarily useful feature, and doubly useful in that it doesn't
violate the second law of thermodynamics, means that the formant
"peaks" will be "blobs," as Pierce so colorfully described them.

-Mike

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 1:48:35 PM

--- In tuning-math@yahoogroups.com, Mike Battaglia <battaglia01@...> wrote:
> Right. They're the three resonances of the vocal tract, but that
> doesn't mean that those frequencies are present in the output signal.
>
> Think about a Helmholtz resonator - an empty beer bottle. If you blow
> across the top of this bottle, you'll hear a certain frequency
> resonate. Now if you load up a tone generator and play a sine wave at
> that frequency out of a speaker, very loudly, next to the bottle,
> you'll feel the bottle vibrate in sympathy. However, if you load up a
> tone generator and play a sine wave at some other pitch, you won't
> feel the bottle vibrate. The bottle is acting as one of the
> "formants," for example.

Right, and another cool thing related to this discussion is that if you play a *square* wave at the bottle at 1/3 of the bottle's resonant frequency, the bottle will resonate in response to the 3rd harmonic of the square wave.

> The formants of the vocal tract, which originate in resonant cavities
> in the chest and head, act analogously. The fact that you don't
> continue to resonate forever when you talk, which I assume was an
> evolutionarily useful feature, and doubly useful in that it doesn't
> violate the second law of thermodynamics, means that the formant
> "peaks" will be "blobs," as Pierce so colorfully described them.

Hahaha. But wouldn't it be kinda cool to have an ideal lossless cavity as your vocal tract? You could sing complicated chords just by singing each note in succession and letting them ring. Of course, the only way you could ever shut up would be to carefully sing all the notes you had already sung, but exactly out of phase so they cancel out.

Those darn laws of thermodynamics would be a problem if you wanted your voice to radiate out so people could hear it, though.

Keenan

🔗WarrenS <warren.wds@gmail.com>

10/6/2011 4:22:53 PM

Ok, I used "audacity" software to record myself just speaking a few vowel-containing words
such as "gloom" using ordinary speaking voice (though unusual in the sense they were just isolated words, not sentences).

Result: "gloom" vowels clearly are NOT made of lovely integer harmonic ratios
and forming a perfectly periodic signal; instead the waveform changes during the course
of the vowel. It may look pretty periodic if you look at only 2 consecutive periods, but if you look at 5 I can clearly see big changes.
Ditto "groove," "mat" and the o in "boffin."

Dayton C. Miller: The science of musical sounds, Macmillan 1922
has a lot of waveform pictures he made with a device he'd invented based on a vibrating rotating mirror deflecting a light beam then photograph it.
Pages 76,77,140,141,205,206,229,231,240,250.
He finds that sung or "intoned" vowels (I think he was always extending the vowel an artificially long time since otherwise he would have had trouble triggering his apparatus
at the right moment) always have exactly periodic waveforms, or at least it sure looks that way if you look at about 7 consecutive periods. For comparison a bell (p.141) produces a highly nonperiodic waveform due to it having incommensurate resonance frequencies.

So, CONCLUSION, apparently your brain listens to yourself talk, and if you are singing a prolonged vowel, or "intoning" it, your brain sees if you have imperfect
sounds (noninteger ratios) and takes corrective action via your muscles somehow, with the result that you emit excellent integer ratios. If however you are just speaking normally,
your brain does not have enough time to make very good adjustments (or does not try?) and hence the quality of the integer ratios is poorer.

At least, that's the only way I can see to explain this. And it is quite amazing, because
in some of Dayton Miller's waveforms involving bass singing voice, the 15th harmonic was significant, so that means your brain's control feedback system can simultaneously adjust 15 frequencies to get them nicely synced up. That just seems incredible.
I have trouble believing my eyes.

In addition to that, I still am unsure how much significance we should ascribe to this in the sense that it doesn't happen (at least not to high accuracy) during normal speech.

🔗Herman Miller <hmiller@prismnet.com>

10/6/2011 5:34:04 PM

On 10/6/2011 7:22 PM, WarrenS wrote:
>
> Ok, I used "audacity" software to record myself just speaking a few vowel-containing words
> such as "gloom" using ordinary speaking voice (though unusual in the sense they were just isolated words, not sentences).
>
> Result: "gloom" vowels clearly are NOT made of lovely integer harmonic ratios
> and forming a perfectly periodic signal; instead the waveform changes during the course
> of the vowel. It may look pretty periodic if you look at only 2 consecutive periods, but if you look at 5 I can clearly see big changes.
> Ditto "groove," "mat" and the o in "boffin."

When you're speaking a word like "gloom" or "mat", your tongue and lips are moving continuously without pausing in one spot. If you were to hold your tongue and lips in position for the "oo" in "gloom" without moving them, and speak in a constant pitch, you'd see a more clearly periodic waveform. The change in the spectrum over time between consonants and vowels is part of what helps us to identify consonants (particularly voiceless stops, which are essentially silent).

Inharmonicity in sounds like plucked strings also causes the waveform to change over time, but not in the same way. The phase of the higher overtones is constantly shifting, which is different from what you see with vocal sounds. What you see in the vowel sounds of spoken words is just the shifting resonances of the vocal tract emphasizing different harmonics of the voice sound.

Since vocal sounds are organic rather than mechanical, they won't repeat perfectly, but one thing they don't do is steadily go out of phase in the way that inharmonic plucked strings do. So if you sing in a good approximation to just intonation, you get that distinct "locked-in" effect that you can hear in some barbershop quartet music for instance. That wouldn't work if voice sounds were inharmonic.

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 6:47:26 PM

--- In tuning-math@yahoogroups.com, Herman Miller <hmiller@...> wrote:
>
> On 10/6/2011 7:22 PM, WarrenS wrote:
> >
> > Ok, I used "audacity" software to record myself just speaking a few vowel-containing words
> > such as "gloom" using ordinary speaking voice (though unusual in the sense they were just isolated words, not sentences).
> >
> > Result: "gloom" vowels clearly are NOT made of lovely integer harmonic ratios
> > and forming a perfectly periodic signal; instead the waveform changes during the course
> > of the vowel. It may look pretty periodic if you look at only 2 consecutive periods, but if you look at 5 I can clearly see big changes.
> > Ditto "groove," "mat" and the o in "boffin."
>
> When you're speaking a word like "gloom" or "mat", your tongue and lips
> are moving continuously without pausing in one spot. If you were to hold
> your tongue and lips in position for the "oo" in "gloom" without moving
> them, and speak in a constant pitch, you'd see a more clearly periodic
> waveform. The change in the spectrum over time between consonants and
> vowels is part of what helps us to identify consonants (particularly
> voiceless stops, which are essentially silent).
>
> Inharmonicity in sounds like plucked strings also causes the waveform to
> change over time, but not in the same way. The phase of the higher
> overtones is constantly shifting, which is different from what you see
> with vocal sounds. What you see in the vowel sounds of spoken words is
> just the shifting resonances of the vocal tract emphasizing different
> harmonics of the voice sound.
>
> Since vocal sounds are organic rather than mechanical, they won't repeat
> perfectly, but one thing they don't do is steadily go out of phase in
> the way that inharmonic plucked strings do. So if you sing in a good
> approximation to just intonation, you get that distinct "locked-in"
> effect that you can hear in some barbershop quartet music for instance.
> That wouldn't work if voice sounds were inharmonic.

Yes! Exactly! This is basically what I was trying to say, but I think you said it better, Herman.

It's not that vowel sounds are perfectly periodic; of course they're not. But there is no "phase slipping" as there would be for something like a bell with independent normal modes.

Keenan

🔗Mike Battaglia <battaglia01@gmail.com>

10/6/2011 6:51:15 PM

On Thu, Oct 6, 2011 at 8:34 PM, Herman Miller <hmiller@prismnet.com> wrote:
>
> When you're speaking a word like "gloom" or "mat", your tongue and lips
> are moving continuously without pausing in one spot. If you were to hold
> your tongue and lips in position for the "oo" in "gloom" without moving
> them, and speak in a constant pitch, you'd see a more clearly periodic
> waveform. The change in the spectrum over time between consonants and
> vowels is part of what helps us to identify consonants (particularly
> voiceless stops, which are essentially silent).

Right. And the harmonics in the STFT are going to, at any moment, be
perfect harmonics of one another.

If you want to see a perfectly harmonic signal, you have to say the
word "gloom" forever, never hitting the "m" part. Then you have to
arrange your axes so that the "gl" is before 0, and then take the
forward Fourier transform. Just make sure you keep saying "gloom"
while you do all this.

Good to hear from you again Herman, it's been a while.

-Mike

🔗Keenan Pepper <keenanpepper@gmail.com>

10/6/2011 6:59:14 PM

--- In tuning-math@yahoogroups.com, "WarrenS" <warren.wds@...> wrote:
> Ok, I used "audacity" software to record myself just speaking a few vowel-containing words
> such as "gloom" using ordinary speaking voice (though unusual in the sense they were just isolated words, not sentences).

Ok, then you're going to get a harmonic sound where the frequency is changing quite rapidly (probably falling because you said the word with falling intonation). I'm curious how you're going to get a spectrum of this...

> Result: "gloom" vowels clearly are NOT made of lovely integer harmonic ratios
> and forming a perfectly periodic signal; instead the waveform changes during the course
> of the vowel. It may look pretty periodic if you look at only 2 consecutive periods, but if you look at 5 I can clearly see big changes.
> Ditto "groove," "mat" and the o in "boffin."

Yes, of course. The frequency of your voice is changing as you say the word. You can also play a fast glissando on a double bass. Does that mean the string doesn't still have integer harmonics?

If you took a spectrum of the entire sound clip, that's not going to tell you what the instantaneous frequency content of the sound is at any point in time. It's the frequency content of the whole clip averaged together. If you take a bunch of harmonic spectra with different fundamental frequencies and average them all together, the harmonics get washed out so you can't see them.

Here's something to try: In audacity, click where it says "Audio track" and then click "Spectrum" to view a time-frequency spectrogram rather than a spectrum of the whole clip at once. You ought to see a series of curves that move down over time, but at any given instant of time they form a perfect harmonic series.

> Dayton C. Miller: The science of musical sounds, Macmillan 1922
> has a lot of waveform pictures he made with a device he'd invented based on a vibrating rotating mirror deflecting a light beam then photograph it.
> Pages 76,77,140,141,205,206,229,231,240,250.
> He finds that sung or "intoned" vowels (I think he was always extending the vowel an artificially long time since otherwise he would have had trouble triggering his apparatus
> at the right moment) always have exactly periodic waveforms, or at least it sure looks that way if you look at about 7 consecutive periods. For comparison a bell (p.141) produces a highly nonperiodic waveform due to it having incommensurate resonance frequencies.

Right.

> So, CONCLUSION, apparently your brain listens to yourself talk, and if you are singing a prolonged vowel, or "intoning" it, your brain sees if you have imperfect
> sounds (noninteger ratios) and takes corrective action via your muscles somehow, with the result that you emit excellent integer ratios. If however you are just speaking normally,
> your brain does not have enough time to make very good adjustments (or does not try?) and hence the quality of the integer ratios is poorer.

No no no, this conclusion is incorrect. Did you read my most recent response to you? It explains that this happens simply because the vocal folds are a nonlinear oscillator that produces periodic but non-sinusoidal waves. It has nothing to do with any kind of biological feedback.

The exact same thing happens with, for example, a bagpipe reed, when you're just pushing air through it and not controlling it at all.

> At least, that's the only way I can see to explain this. And it is quite amazing, because
> in some of Dayton Miller's waveforms involving bass singing voice, the 15th harmonic was significant, so that means your brain's control feedback system can simultaneously adjust 15 frequencies to get them nicely synced up. That just seems incredible.
> I have trouble believing my eyes.

You should, because you're thinking of it the wrong way. It would indeed be incredible if the brain did this, but it's really just the physics of the vocal folds doing it.

> In addition to that, I still am unsure how much significance we should ascribe to this in the sense that it doesn't happen (at least not to high accuracy) during normal speech.

Does too. =)

Keenan

🔗Mike Battaglia <battaglia01@gmail.com>

10/6/2011 9:40:37 PM

On Thu, Oct 6, 2011 at 9:59 PM, Keenan Pepper <keenanpepper@gmail.com> wrote:
>
> > In addition to that, I still am unsure how much significance we should ascribe to this in the sense that it doesn't happen (at least not to high accuracy) during normal speech.
>
> Does too. =)

I'd like to weigh in though and state that I think there's a lot of
potential for this "debugging" idea in general, outside of this
specific notion that we're fixating on about the human vocal tract. In
as much as the word "debugging" can be substituted with
"self-medicating," I say you're on the right track. But why must it be
that we're debugging the auditory system - why not debugging the
psyche?

As an example, some people might feel they've "lost" some sort of
vividness of perception as adults; that we experienced the world more
intensely as children. Some might feel that they had some degree of
synesthesia, or freedom of creative association that was lost in the
transition to adulthood. More precisely, it could be stated that, in
some individually dependent fashion, the increase in stress that
adulthood brings can adversely affect ones physical perception of the
world in subtle ways. Music can at times reactivate those forms of
perception, which could be said to be one form of "debugging" that
music or any art form can serve as - testing to see what's going
"wrong" and how to fix it, etc.

Of course, everyone's different, and everyone faces their own
problems. That notwithstanding, I think it's a pretty universal human
theme that people listen to music to learn, to explore, to escape,
perhaps even to self-medicate, and that this is a testament to the
"debugging" paradigm.

"One good thing about music - when it hits, you feel no pain." - Bob Marley

-Mike

🔗WarrenS <warren.wds@gmail.com>

10/7/2011 11:10:39 AM

OK, thank you all, you're all making a lot of sense.
I'm not necessarily going to buy everything 100% immediately, but you are making a lot of sense.

So the current explanation is that your vocal apparatus is a nonlinear oscillator and thus emits perfectly periodic but not sinewave oscillations, and if you keep singing/intoning the same vowel without moving lips like in normal speech, you get excellent integer ratios in spectra.

At least some nonlinear oscillators do not behave that way, e.g. they yield
very nonperiodic "chaos." But some do act that way. For example a pendulum
with large swing angle is a nonlinear oscillator and is perfectly periodic and
non-sine-wave. A square wave produced by a ball bouncing between two parallel walls, is
another example.

Seems a bit weird we should enter the nonlinear regime, which for elastic oscillations would seem to imply we've got very severe stresses and strains approaching the tensile strength of the material (small stresses & strains is in the linear regime). That is
hard to believe. Another possibility is that complicated fluid turbulence phenomena give
the effect of nonlinear oscillator. For example "edge tones" causing by air blowing past a knife edge (setting up vortices and oscillation in the air) probably are very complicated.

OK... an idea occurred to me to test this whole hypothesis.
I again fire up "audacity" and I say+record 5 repetitions of the same word "head."
I examine the waveforms in the vowel portion of the word.
The question is: am I going to see the same waveform in head#1, as I see in head#2?
I should get the same power spectrum for all heads, since they all sound
essentially the same. But I might nevertheless get completely different looking waveforms,
due to different phase shifts (which you cannot hear) between the Fourier components.
IF they differ a lot in shape, that'd say it is a linear phenomenon with all the
different harmonics evolving independently and starting at random relative phase. But if the waveforms stay the same shape that'd mitigate in favor of the nonlinear hypothesis.

Well... the waveforms do look pretty similar.
Incidentally I have a bass-baritone voice, which means my waveforms are pretty interesting (the more bass you are, the more interesting your waveshapes, according to Miller).

OK, trying again with "bah" each word said slowly ("baaaaahhhh" for duration about 1.7 sec). Confirming previous claims, I now see a very periodic (albeit very complex;
it has 7 peaks) waveshape with 120.5 Hz fundamental frequency.
The only noticeable departure from perfect periodicity is comparatively slow
(about 12 Hz) amplitude variation (which itself looks roughly sinewave?)
by about a 3:2 ratio max:min.
And WOW, my first three bahs have almost exactly the same complicated waveform!!

OK, nonlinearity hypothesis CONFIRMED!

I don't know if this exactly counts as an ironclad proof, but it is neat.
Who invented this whole theory?

🔗WarrenS <warren.wds@gmail.com>

10/7/2011 11:16:50 AM

--- In tuning-math@yahoogroups.com, Mike Battaglia <battaglia01@...> wrote:
>
> On Thu, Oct 6, 2011 at 9:59 PM, Keenan Pepper <keenanpepper@...> wrote:
> >
> > > In addition to that, I still am unsure how much significance we should ascribe to this in the sense that it doesn't happen (at least not to high accuracy) during normal speech.
> >
> > Does too. =)
>
> I'd like to weigh in though and state that I think there's a lot of
> potential for this "debugging" idea in general, outside of this
> specific notion that we're fixating on about the human vocal tract. In
> as much as the word "debugging" can be substituted with
> "self-medicating," I say you're on the right track. But why must it be
> that we're debugging the auditory system - why not debugging the
> psyche?

--well, actually, yes, my paper already has that idea, and discusses it a lot.
Evidence re "music therapy" examined; also abstract mathematical debugging notions
examined in mathematical pseudo-brains fed mathematical pseudo-music.

Yes, this vocal tract thing indeed is a fixation on a minor part of the paper, but it is interesting and does indicate I should revise that part.

🔗Carl Lumma <carl@lumma.org>

10/7/2011 12:14:26 PM

Hi Warren,

>Seems a bit weird we should enter the nonlinear regime, which for
>elastic oscillations would seem to imply we've got very severe
>stresses and strains approaching the tensile strength of the material
>(small stresses & strains is in the linear regime). That is
>hard to believe.

See http://www.phys.unsw.edu.au/jw/voice.html#sound

>Who invented this whole theory?

Which? The whole 'human hearing is highly adapted to extract
information from human vocalizations'? It seems fairly obvious.
'Music is a kind of synthetic speech' can probably also be
traced back very far. Some interesting recent applications of
these ideas in neuroanatomy are cited here

http://lumma.org/tuning/faq/#harmonictemplate

-Carl

🔗WarrenS <warren.wds@gmail.com>

10/7/2011 12:36:43 PM

--- In tuning-math@yahoogroups.com, Carl Lumma <carl@...> wrote:
>
> Hi Warren,
>
> >Seems a bit weird we should enter the nonlinear regime, which for
> >elastic oscillations would seem to imply we've got very severe
> >stresses and strains approaching the tensile strength of the material
> >(small stresses & strains is in the linear regime). That is
> >hard to believe.
>
> See http://www.phys.unsw.edu.au/jw/voice.html#sound
>
> >Who invented this whole theory?
> which?

--that we get a lot of exact-integer harmonics in human vocal tract due to nonlinear oscillations.
Here's a candidate for the answer:
Neville H Fletcher, author of book
The Physics of Musical Instruments (1991).
His 1993 paper
http://www.phys.unsw.edu.au/music/people/publications/Fletcher1993.pdf

points out that a "pressure controlled gas flow valve" can be
a nonlinear oscillator since the Bernoulli pressure from a flowing fluid is
proportional to its flow speed SQUARED.

Your vocal folds ("cords") act that way and also interact with each other, so
it could be a pretty complicated nonlinear system.
So yes, this appears to clear up the mystery.

🔗Carl Lumma <carl@lumma.org>

10/7/2011 1:06:14 PM

Warren wrote:

>> >Who invented this whole theory?
>> which?
>
>--that we get a lot of exact-integer harmonics in human vocal tract
>due to nonlinear oscillations.
>Here's a candidate for the answer:
>Neville H Fletcher, author of book
>The Physics of Musical Instruments (1991).
>His 1993 paper
> http://www.phys.unsw.edu.au/music/people/publications/Fletcher1993.pdf

Oh no, it's been known since the '50s at least. This
seems to be the canonical reference:

van den Berg, J (1958)
"Myoelastic-aerodynamic theory of voice production"
Journal of Speech and Hearing Research

-Carl

🔗Mike Battaglia <battaglia01@gmail.com>

10/7/2011 1:28:08 PM

On Fri, Oct 7, 2011 at 2:10 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> OK, thank you all, you're all making a lot of sense.
> I'm not necessarily going to buy everything 100% immediately, but you are making a lot of sense.
>
> So the current explanation is that your vocal apparatus is a nonlinear oscillator and thus emits perfectly periodic but not sinewave oscillations, and if you keep singing/intoning the same vowel without moving lips like in normal speech, you get excellent integer ratios in spectra.

No, it's just that you need to sing/intone the same vowel without
changing the pitch. Moving your lips will change the amplitude of each
harmonic, but won't skew their position in the spectrum.

> Seems a bit weird we should enter the nonlinear regime, which for elastic oscillations would seem to imply we've got very severe stresses and strains approaching the tensile strength of the material (small stresses & strains is in the linear regime). That is
> hard to believe. Another possibility is that complicated fluid turbulence phenomena give
> the effect of nonlinear oscillator. For example "edge tones" causing by air blowing past a knife edge (setting up vortices and oscillation in the air) probably are very complicated.
//snip
> I don't know if this exactly counts as an ironclad proof, but it is neat.
> Who invented this whole theory?

You should read this:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2811547/?tool=pubmed

This is the state of the art theory in vocal production today, as far
as I know. Here's a relevant paragraph:

"The skewing of the flow pulse guarantees a dominant excitation near
glottal closing and raises the energy in the harmonics (Fant, 1986).
In the past it has been assumed that the harmonic spectrum of the
source comes primarily from vocal fold collision. This may be true for
many phonations, but this example shows that vocal fold collision is
not essential to produce source harmonics. Nonlinear source-filter
coupling can produce a spectrum of source frequencies, with a spectral
slope of about -15 dB per octave in this case. Furthermore, the
harmonic amplitudes are affected by the reactance curve."

From the above we can gather that for many phonations (but not all),
it IS true that vocal fold collision causes the harmonic spectrum. If
so, you can think of vocal fold collision as "clipping" the motion of
the oscillators. If you've worked with analyzing memoryless
nonlinearities before, and the use of Taylor series in understanding
what they do to the frequency spectrum of the input signal, it'll be
apparent to you how this generates harmonics. For the rest, it seems
like they're saying it comes from nonlinear coupling junctions
(perhaps scattering junctions?) between source and filter; e.g.
between the vocal folds and vocal tract resonances. As a first
approximation you can consider the nonlinear junctions that occur when
you couple a reed to a long pipe, as in a clarinet.

In general though, some good resources on the acoustics of waveguides
(and modeling them digitally) can be found here:

https://ccrma.stanford.edu/~jos/pasp/pasp.html

-Mike