back to list

RE: Pitch Detection

🔗Robert Walker <robertwalker@ntlworld.com>

3/21/2002 4:19:37 PM

Hi Paul,

As you'll see from my SD measurements, the SD seems to be
about 0.5 cents, so occasional values 1 cent out are to
be expected.

The method of taking account of all the partials and using
those to adjust the frequency of the fundamental seems
to achieve getting on for another order of magnitude
in the precision.

BTW the earlier results using Jain's method may seem poor,
but they are far better than FFT using just the nearest bin size.
I used to need to do an FFT of, say, maybe a minute or so
in order to get decent frequency measurements using the nearest
bin size approach.

This is a good exercise for the new FFT features in
FTS actually, and as you see, has lead to a new idea for
improving the accuracy of the FFT frequency detection.

Thanks for the suggestion to try it :-).

Robert

🔗Robert Walker <robertwalker@ntlworld.com>

3/22/2002 10:02:09 PM

Hi Fran�ois

> > Perhaps weighting them all evenly is a reasonable compromise between
> > preferring the higher ones as the more exact measurements,
> > and preferring
> > the lowest ones as possibly more likely to be exact multiples of the
> > pitch for real timbres.

> That looks like an excellent idea!! I will probably try to use it on my
> natural voice analysis. Up to now, I made no such weighting, I made a manual
> confidence interval estimation based on the broadness of the peak, divide it
> by N (the harmonic number) ant take the "most reliable". Perhaps the
> weighting method you propose may be of some help to increase the accuracy of
> my measurement.

That's an interesting idea. I may well adopt it too.

The approaches could be combined, e.g. weight
the partials inversely by the width of the peak. One can measure that as
the width at say, 20 Db quieter than the peak, or whatever.

I'll probably eventually add various weighting schemes, plus also an option
to restrict to first n partials for user defined n.

With the mean value technique the broader peaks may also be more reliable than
they were with the other peak interpolation methods, as it depends on as much
of the peak as one wants to include.

- e.g sometimes if you look at the top of a broad peak you will find a
series of small peaklets (to coin a word) or summits, and then the
highest of those is often off-centre.

When one takes account of more of the population of the broad peak, then
you find that the frequency detected may be between two of the small
peaklets.

Another idea is that instead of taking the FFT values as the population,
it may be better to take the log of the FFT values.

Another thing I will probably add in is to look for stretched or compressed
partials. Idea is that one could measure how much the partial gets stretched
by per octave and use that to divide the partials down to the same frequency
as the fundamental.

I wonder how clear the idea of taking the mean of the population is?

The idea is that you think of the FFT values near a peak (e.g. down to
10 percent of the height of the peak) as a population distribution.

You could think of each FFT value as a number of votes for that
particular frequency. So to find the total, you multiply each frequency
bin by the number of votes for that frequency. Add the results for
all the bins. Then at the end, divide by the total number of votes to
get the mean value.

I wonder if that is how we beat the pitch resolution limit
- rather easier than peak interopolation.

Then, since we hear logarithmically in volume, and are pretty
good at detecting pitch (and since it is the way that we hear pitch
that is most interest to us anyway), I wonder if one might
get better results by using the logarithm of the number of votes
for each frequency bin instead.

Another thought, if one can't go down far enough because one has
several close together peaks, then one should go down as far
as the shallowest valley (to either side) and then use that
as the minimum number of votes to take account of. Otherwise,
the peak that is nearest to another one will get biased because
the reading will include some of its values too.

In fact, probably one should stop a bit short of the valley, as it
is often getting asymmetrical by that point.

Anyway I'm sure there will be lots of subtleties to it eventually,
but it seems to be on the right track somehow...

> > looks like you're nailing it now. note that this does not violate the
> > classical uncertainty principle, since here it's *given* that the
> > signal is to be interpreted as frequencies unchanging over time
> > (while in many situations that wouldn't be the case), plus it's
> > *given* that the partials are exactly harmonic (while in many
> > situations *that* wouldn't be the case).
> >
> > congratulations on your achievement here!

> I can only share Paul's enthousiasm for your excellent work on this.

Thanks to you both - it's appreciated.

Robert