back to list

Re: Perception of pitch

🔗Robert Walker <robertwalker@ntlworld.com>

3/18/2002 9:04:28 PM

HI Paul,

> Much has been made of the 3- and 4- cent deviations from JI in 72-tET
> chords. At a typical musical frequency of 440Hz, 4 cents is a 1Hz
> deviation. So the classical uncertainty principle would seem to say
> that, for frequency to be determined to better than this accuracy,
> the note would have to be played for 2*pi, or over 6 seconds! Clearly
> most music has melodies and even chord changes that are much faster
> than this. Thus any attempt to say whether the chords were in JI or
> in 72-tET would be meaningless.
>
> Rebuttals?

It is true that the bin size for FFT is independent of the
sample rate and the accuracy of measurement. Also I gather
there is quite a close mathematical connection between
the derivation of fourier transforms and quantum mechanics,
though that is way outside my own field of specialism
in maths.

However without needing to know the details, it is abundantly
clear from various observations and results
that this doesn't carry over in exact fashion to music.

You can measure frequencies of single partials much more accurately
than that using peak interpolation for FFT, and I do it often. That
method gives about an extra order of magnitude of precision.
The idea is that instead of just looking for the highest peak in
the FFT, you also look at the ones to either side, and then
by seeing which side the second highest peak is, and how much
higher it is than the third highest, you can find the position more
accurately

|
| |
| | |
..............
|

Estimated peak is a bit to the left of the highest bin.
This is an established technique. Commonly used
FFT peak interpolations include Jain's method,
or Quinns first / second estimator to do it.

You can get even better accuracy only limited by the accuracy with
which the sample points are measured using wave counting and interpolation
to find the exact position for the zero crossings, in the case of
suitable wave forms. With the zero crossing interpolation method
one would get arbitrary accuracy if one were to increases the sample rate
and the precision with which samples are measured.

That's obvious really - if you have a sample rate of millions of
samples per second, then you are going to have an accuracy of
better than a millionth of a second per second for measuring the
places where the wave crosses the zero point. So, find two
such points roughly a second apart, count the number of waves between.
find the exact time from one to the other, divide the
number of waves by the exact time, and there you are, a frequency
to extremely high accuracy.

We don't have such high sample rates; only tens of thousands of samples
per second. However even then, one can use interpolation to
find the zero position - if the wave crosses between two
sample points, then plot the two points on a graph, and
join them together with a line, and see where it crosses
the x axis. With 16 bit integers you have plenty of precision
to find that zero crossing point with great accuracy, and
so get an order of magnitude or so better frequency
measurement than one might expect without this technique.

|\
| \
| \
.......\......
\ |
\|

Interpolated zero crossing point - between these two samples
and somewhat closer to the right hand one of the two.

Then, having measured the time so accurately this way, count
the number of waves in between as before, divide one by the
other and get an extremely accurate frequency measurement
- it is far better than FFT when the waveform is suitable.

A suitable waveform is simply one that is easy to count, and that
has periodically repeating crossing points - that is not essential
for a musical tone, but many instruments are like that.

It is not so easy to automate what one could do by eye, as
the eye is very good at recognising the shapes of waveforms,
but I'm working on it. It can be automated easily for various
straightforward waveforms such as sine waves and triangular
waves etc.

As for FFT:

There seem to be interesting analogies between FFT and quantum
mechanics, but that is all. In QM you can only make a single
observation of the waveform, and when that is done, the
wave is gone, collapsed, and you can never find any more info
about it. In FFT the waveform is still there and you can
run any number of extra observations on it and refine your
measurements.

The thing is linear isn't it? So that if you make it 0.6
seconds instead of 6 seconds, one shouldn't be able to
distinguish notes even of 40 cents apart, if those are the figures?

I'm not sure where your numbers come from. I make the FFT bin
size for 6 seconds a little under 0.2 hz, and then
you have to go down to 0.1 seconds to get a 40 cents
bin size at 440 hz, but that is hardly an ultra short
note - one can't imagine a tenth second step of 40 cents
(nearly a quarter tone) will defeat anyone who is well
used to listening out to pitches of ntoes.

Whatever the method the ear uses, it is clearly doing
a bit better than just finding the nearest FFT bin.

Robert

🔗LAFERRIERE François <francois.laferriere@cegetel.fr>

3/19/2002 8:43:32 AM

Paul wrote:
> > Much has been made of the 3- and 4- cent deviations from JI in 72-tET
> > chords. At a typical musical frequency of 440Hz, 4 cents is a 1Hz
> > deviation. So the classical uncertainty principle would seem to say
> > that, for frequency to be determined to better than this accuracy,
> > the note would have to be played for 2*pi, or over 6 seconds! Clearly
> > most music has melodies and even chord changes that are much faster
> > than this. Thus any attempt to say whether the chords were in JI or
> > in 72-tET would be meaningless.
> >
> > Rebuttals?

Sorry to contradict you Paul, I did 'nt notice the 2*pi factor on first
reading. In fact you mixed up angular frequency (in radian / s.) and time
frequency (in Hz). So in fact, the relationship between precision deltaF (in
Hz) and length of the window T is simply deltaF = 1/T (not 2*pi/T). In spite
of this 2*pi factor, Paul is basically correct about the classical
uncertainty principle.

Robert wrote:
> It is true that the bin size for FFT is independent of the
> sample rate and the accuracy of measurement. Also I gather
> there is quite a close mathematical connection between
> the derivation of fourier transforms and quantum mechanics,
> though that is way outside my own field of specialism
> in maths.

When I worked in the field of digital signal processing, I banged my head on
this 1/T relationship for a long time before admitting it as fact of life.
So I do not think I can convince Robert in a few sentences. And I am not
especially a good teacher...

> However without needing to know the details, it is abundantly
> clear from various observations and results
> that this doesn't carry over in exact fashion to music.

Sorry to be down to earth, but music is not different from any other signal.

> You can measure frequencies of single partials much more accurately
> than that using peak interpolation for FFT, and I do it often. That
> method gives about an extra order of magnitude of precision.
> The idea is that instead of just looking for the highest peak in
> the FFT, you also look at the ones to either side, and then
> by seeing which side the second highest peak is, and how much
> higher it is than the third highest, you can find the position more
> accurately
>
> |
> | |
> | | |
> ..............
> |

Myself, I use this method to enhance a bit my evaluation of central
frequency for sharp peaks.
you can actually find a value there, correct, but the confidence interval
still remain 1/T. It may occur that, if you make the FFT of a pure sine, you
find, by interpolation the original frequency with some accuracy but, then
this is an artifact of the way the dataset was produced an information that
you know a priori. This principle cannot increase the confidence interval
for natural signal.

>
> Estimated peak is a bit to the left of the highest bin.
> This is an established technique. Commonly used
> FFT peak interpolations include Jain's method,
> or Quinns first / second estimator to do it.

This rely on the (resonable) assumption that peak is symetrical. That is
probably true to some extend, but not to an infinite precision.

> You can get even better accuracy only limited by the accuracy with
> which the sample points are measured using wave counting and
> interpolation
> to find the exact position for the zero crossings, in the case of
> suitable wave forms. With the zero crossing interpolation method
> one would get arbitrary accuracy if one were to increases the
> sample rate
> and the precision with which samples are measured.

Zero crossing rate pitch evaluation method is not a general method (it is
necessary to make some assumption on the waveform properties). Nevertheless,
it may help to explain the uncertainty principle. Let say that you have N
and "something" crossings per second. The "something" cannot be known to a
precision better than 1/T except if you assume that interpolation is
reliable. But in fact it is not, I will try to explain why.

> That's obvious really - if you have a sample rate of millions of
> samples per second, then you are going to have an accuracy of
> better than a millionth of a second per second for measuring the
> places where the wave crosses the zero point. So, find two
> such points roughly a second apart, count the number of waves between.
> find the exact time from one to the other, divide the
> number of waves by the exact time, and there you are, a frequency
> to extremely high accuracy.

It is falsely obvious. If we had a sample rate a million samples per second
we would have a much higher cutof frequency (500 KHz), but doing so, we
would allow higher frequency to get in the signal. Those frequency
contribute to modify the time domain amplitude of the signal in such a way
that the each gain in cut-off frequency CONTERBALANCE EXACTLY the precision
that could have been raised by more samples. Each time you increase the
number of samples, you may have the impression that you increase the
interpolation accuracy, by each time, high frequencies kick in to make
interpolation as bad as before. Thus, having no a priori knowledge of the
amplitudes of high frequencies, a raise in the sample frequency provide
extra information on those high frequencies, but give no more information on
the low frequency part of the spectrum.

Have higher sampling frequency would change absolutely nothing!!

The fact that we talk about digital signal contribute to make this falsely
obvious. The deltaF = 1/T was known from continuous signal theory before
AD/DA converter were widely available.

>
> We don't have such high sample rates; only tens of thousands
> of samples
> per second. However even then, one can use interpolation to
> find the zero position - if the wave crosses between two
> sample points, then plot the two points on a graph, and
> join them together with a line, and see where it crosses
> the x axis. With 16 bit integers you have plenty of precision
> to find that zero crossing point with great accuracy, and
> so get an order of magnitude or so better frequency
> measurement than one might expect without this technique.
>
>
> |\
> | \
> | \
> .......\......
> \ |
> \|
>
> Interpolated zero crossing point - between these two samples
> and somewhat closer to the right hand one of the two.

The precision of the samples can do nothing to increase accuracy, as long as
the amplitude of other frequencies (than the one we are looking for) are as
disturbing then they were before.

> Then, having measured the time so accurately this way, count
> the number of waves in between as before, divide one by the
> other and get an extremely accurate frequency measurement
> - it is far better than FFT when the waveform is suitable.
>
> A suitable waveform is simply one that is easy to count, and that
> has periodically repeating crossing points - that is not essential
> for a musical tone, but many instruments are like that.
>
> It is not so easy to automate what one could do by eye, as
> the eye is very good at recognising the shapes of waveforms,
> but I'm working on it. It can be automated easily for various
> straightforward waveforms such as sine waves and triangular
> waves etc.
>
> As for FFT:
>
> There seem to be interesting analogies between FFT and quantum
> mechanics, but that is all. In QM you can only make a single
> observation of the waveform, and when that is done, the
> wave is gone, collapsed, and you can never find any more info
> about it. In FFT the waveform is still there and you can
> run any number of extra observations on it and refine your
> measurements.

It is worth to be noted that this principle is not specific to FFT but
applies as well to a familly of transformation of time (or space) domain
into frequency domain.

> The thing is linear isn't it? So that if you make it 0.6
> seconds instead of 6 seconds, one shouldn't be able to
> distinguish notes even of 40 cents apart, if those are the figures?
>

Figures propose by Paul a systematically wrong but by less than an order of
magnitude. In fact it is 4cents for 440Hz and 1 seconds duration. It seems
to me compatible with the classical harmony concept of "transitional chord"
but perhaps is it too far fetched?

> I'm not sure where your numbers come from. I make the FFT bin
> size for 6 seconds a little under 0.2 hz, and then
> you have to go down to 0.1 seconds to get a 40 cents
> bin size at 440 hz, but that is hardly an ultra short
> note - one can't imagine a tenth second step of 40 cents
> (nearly a quarter tone) will defeat anyone who is well
> used to listening out to pitches of ntoes.
>

My figure come from straighforward computation

ln(441/440) / (ln(2)/1200) = 4 cents

BTW, for small deltaF wrt F the approximation

deltaF/F / (ln(2)/1200) is nearly as good

> Whatever the method the ear uses, it is clearly doing
> a bit better than just finding the nearest FFT bin.

the ear-brain is a subtle engine that may use a lot of trick to recognize
complex pattern, but is nevertheless submitted to physical laws.

yours truly

Fran�ois Laferri�re

🔗graham@microtonal.co.uk

3/19/2002 9:19:00 AM

In-Reply-To: <40043BDD1EC9D211946400A024A6E1840A013E57@cassandre.sfr.com>
=?iso-8859-1?Q?LAFERRIERE_Fran=E7ois?= wrote:

> Sorry to be down to earth, but music is not different from any other
> signal.

A typical (synthesised harmonic) timbre is atypical in two important
respects

1) It's periodic

2) It's dominated by low partials

> Myself, I use this method to enhance a bit my evaluation of central
> frequency for sharp peaks.
> you can actually find a value there, correct, but the confidence
> interval
> still remain 1/T. It may occur that, if you make the FFT of a pure
> sine, you
> find, by interpolation the original frequency with some accuracy but,
> then
> this is an artifact of the way the dataset was produced an information
> that
> you know a priori. This principle cannot increase the confidence
> interval
> for natural signal.

The information you know a priori is that you're dealing with a periodic
signal.

> This rely on the (resonable) assumption that peak is symetrical. That is
> probably true to some extend, but not to an infinite precision.

You'll get something approaching a Gaussian, won't you?

> Zero crossing rate pitch evaluation method is not a general method (it
> is
> necessary to make some assumption on the waveform properties).

Yes, the assumption is that the waveform has weak high partials.

> It is falsely obvious. If we had a sample rate a million samples per
> second
> we would have a much higher cutof frequency (500 KHz), but doing so, we
> would allow higher frequency to get in the signal.

Only if higher frequencies were present in the original (analog) signal.
The weaker they are, the weaker the effect.

> The fact that we talk about digital signal contribute to make this
> falsely
> obvious. The deltaF = 1/T was known from continuous signal theory before
> AD/DA converter were widely available.

Again, you say something is "falsely obvious". As Robert's able to get
consistent results beyond the deltaF restriction, it should be obvious
that his method is not false.

Graham

🔗Robert Walker <robertwalker@ntlworld.com>

3/19/2002 10:24:33 AM

Hi Francois,

> This rely on the (resonable) assumption that peak is symetrical. That is
> probably true to some extend, but not to an infinite precision.

Yes, probably this is why it works. Also, isn't true to infinite precision,
but enough to get some improvement over the nearest FFT bin size.

> you can actually find a value there, correct, but the confidence interval
> still remain 1/T. It may occur that, if you make the FFT of a pure sine, you
> find, by interpolation the original frequency with some accuracy but, then
> this is an artifact of the way the dataset was produced an information that
> you know a priori. This principle cannot increase the confidence interval
> for natural signal.

If you are analysing noise, probably yes. But if it is a musical instrument,
the peaks are fairly symmetrical, if one looks at a detailed FFT
using a long note. So it works well in that case.

One is presumably relying on some special properties of the types of sound
used to make clearly pitched musical notes, but for that it increases
the confidence interval.

One can demonstrate that it works. If you find the frequency of a constant
pitch musical note using a short sample of say 0.1 seconds using peak
interpolation, then do another analysis of 1 second to find the nearest FFT
bin size, by the usual non peak interpolation method, you end up with the
same frequency as you got using the peak interpolation method for the short sample.

Confidence intervals are based on statistics - what is the probablity that
the actual pitch of the note is within so many standard deviations
of the measured value. So if you find that a method increses the
probability of being within a particular range of the target pitch,
that means you have reduced the confidence interval, and improved the
accuracy. That is just the definition of a confidence interval.

So this shows that peak interpolation reduces the confidence interval,
and in my experience, it does so by about an order of magnitude
(factor of ten) for most musical timbres.

If there is some theory that says that this is impossible, then
it isn't in accord with results, and needs to be revised -
experiment should drive theory rather than the other way round.

Also what about the thing about perceiving a quarter note pitch change
using notes of a tenth of a second. That isn't at all controversial
I'd have thought. It is very easy for many to do.

My guess is that _if_ the ear does rely on fft like methods then
it is using some type of peak interpolation, and those who
are particularly good at discriminating pitch have learnt to
interpolate better. I assume that they are still hearing much the same
way as anyone else - because physically we all have pretty
much the same perception. Except of course, maybe some will
lack the fine hairs altogether and be deaf, or have very
few of them and maybe be unable to discriminate pitch
at all well.

(Some who can't distinguish pitch at all well, even notes
a semitone apart, may just have never learnt to give it
particular attention, and may be physically able to if they
were to do some training).

I imagine fine pitch discrimination is something that can be learnt,
e.g. by spending many hours a day tunign a musical instrument,
one would probably increase ones pitch perception by gradually
learning to do the peak interpolation better.

As for the zero crossing rate - to get it to work you
need to be able to measure the exact time of the zero
crossings.

If you have a pure sine wave, or nearly, no higher
harmonics, then it will work, and even if you
increase the sample rate, if there are no higher
harmonics in the waveform to find, then there
is nothing to perturb the positions of the zero
crossings.

Just to explain it a bit more clearly.

Suppose you have 500 waves, and you
measure the exact time of the first
zero crossing, and the last one, and the
time interval between the two is, say
0.851323 seconds, or whatever it is.

Just divide one by the other
500/0.851323

to get frequency of
587.323216

Then if you work out the time interval to
greater precision, the frequency is more accurate.

This can be done even with a single wave -
measure the exact time for a single wave
to great accuracy and you get a good frequency
determination.

The only way I can imagine this mightn't work is if
you can't measure the waveform accurately in the first
place - if there is some inaccuracy in the measurement
of the samples. But, we aren't talking about quantum
mechanics here, where there is some fundamental
limitation on what can be measured.

Improve the accuracy of the measurement of the individual
samples, and increase the sample rate, and you
will improve the precision with which you know
the waveform.

(presumably there is some limit eventually when you
meet the discrete nature of air as made up of individual
molecules, but at order of 10^27 or so
molecules involved, that is probably not going to
be reached easily).

Now if the waveform is better known, and it has
no higher harmonics in it, then you get a
very accurate count.

Even if it has higher harmonics, if those are of
lower amplitude than the first harmonic, then
they don't in practice affect the position of the
zero crossing noticeably.

Even if they are of higher amplitude than the first
harmonic, but are multiples of its frequency,
that also doesn't affect the zero crossing
method appreciably either, as the harmonics
wont perturb the position of the zero crossings at
all if they are at exact multiples of the frequency
of the fundamental. All harmonic timbres
fall into this category.

So the wave counting method has a lot of promise
in it. In practice it seems to do even better than
FFT with peak interpolation for suitable waveforms,
- maybe getting on for another order of magnituede improvement
in pitch recognition if you are lucky.

I'd be interested to know what it is that signal
processing theory has proved.

It seems at any rate that with musical notes and typical
musical timbres, e.g. reasonably harmonic ones (I expect
some inharmonic ones too if they aren't just a kind of
coloured noise) that one can go beyond the limitations
by one or two orders of magnitude. With the wave crossing
method, in principle there is no limitation at all if
the waveform is a harmonic timbre as the higher harmonics
don't perturb the positions of the zero crossings.

Robert

🔗paulerlich <paul@stretch-music.com>

3/19/2002 12:58:03 PM

--- In tuning@y..., LAFERRIERE François <francois.laferriere@c...>
wrote:
> Paul wrote:
> > > Much has been made of the 3- and 4- cent deviations from JI in
72-tET
> > > chords. At a typical musical frequency of 440Hz, 4 cents is a
1Hz
> > > deviation. So the classical uncertainty principle would seem to
say
> > > that, for frequency to be determined to better than this
accuracy,
> > > the note would have to be played for 2*pi, or over 6 seconds!
Clearly
> > > most music has melodies and even chord changes that are much
faster
> > > than this. Thus any attempt to say whether the chords were in
JI or
> > > in 72-tET would be meaningless.
> > >
> > > Rebuttals?
>
> Sorry to contradict you Paul, I did 'nt notice the 2*pi factor on
first
> reading. In fact you mixed up angular frequency (in radian / s.)
and time
> frequency (in Hz). So in fact, the relationship between precision
deltaF (in
> Hz) and length of the window T is simply deltaF = 1/T (not 2*pi/T).
In spite
> of this 2*pi factor, Paul is basically correct about the classical
> uncertainty principle.

in my original message, which appeared many months ago, i was citing
a website which was using a particular definition of uncertainty.
perhaps that's what you guys should be replying to, if you can find
it.

🔗paulerlich <paul@stretch-music.com>

3/19/2002 2:23:45 PM

hi robert,

would you mind subjecting the 'jerries' to your analysis algorithm?
it'll be interesting to see what cents values you come up with.
perhaps this will show francois the utility of your methods better
than any verbal explanation?

see gene, the possible uses of the 'jerries' keep multiplying! :)