back to list

Pitch detection

🔗Robert Walker <robertwalker@ntlworld.com>

3/20/2002 6:49:52 AM

HI Graham,

Your remark about the gaussian got me thinking
about distributions, and means, and I hit on a
new idea for peak detection.

The thing is that these are all three point methods
- only use three data points. One wonders if using
more than three would increase the accuracy - seems
likely. Speed isn't a consideration here because
one only need do the exact peak positioning
for ones that are already selected as peaks
(or at least, as I do it now, at a preliminary stage in the
processing to find likely peaks).

If you look at the peaks that get found using
Jain's method, often you can see that they
are to one side or the other of the position
one would pick out by eye. It isn't so surprising
as they use only three values to find the peak.

So I had the idea of going down to, say, 5 %
(or whatever) of the height of the peak to either
side, treating the FFT values as the population,
and finding the mean value.

Also did a bit extra to cope with the fact that
if you join the points (e.g. using straight lines
as 1st approx) then the point at which the curve
will cross the 5% value will be between two data
points. Anyway I did something reasonable there.

Here are the results:
.........................................

Fractal Tune Smithy Custom voice or FFT results As Timbre Partials

Freq. Amp. (db) Cents

233.925368 99.7772512 0
292.481755 100 386.757972
350.908735 99.320157 702.057048
467.952108 85.0273321 1200.37508
584.983612 85.63987 1586.81747
701.925861 87.2793031 1902.3244
877.336224 77.6246343 2288.49782
935.798521 73.6975563 2400.17955
1052.73478 78.5803097 2604.02615
1169.6912 73.4924172 2786.40898
1403.82061 68.8110042 3102.28604
1462.2193 69.5782544 3172.84737
1637.95775 63.7875717 3369.3335
1754.63814 73.7779475 3488.46397
1871.51156 61.5131567 3600.10048
2047.39581 63.7343926 3755.60377
2105.60418 64.2014453 3804.13684
2339.64795 55.7295876 3986.60548
2456.64434 63.663738 4071.08254
2573.95612 55.9223244 4151.84068
2632.03212 58.2986596 4190.46825
2807.45928 63.9440635 4302.17384
2924.47242 56.8990651 4372.86739
3041.32035 52.0982817 4440.69308
3158.58298 58.8073264 4506.1887
3216.60706 55.9111944 4537.70338
3275.93668 51.3327412 4569.3447
3509.3874 62.4252546 4688.51878
3743.03987 49.2280227 4800.10823
3801.44419 53.1985153 4826.91286
3860.23497 55.018926 4853.48213
3968.51739 47.3203941 4901.37588
4094.58557 51.1834014 4955.51665
4211.5208 53.6240329 5004.26528
4386.87147 49.9039761 5074.88665
4445.60473 46.4831365 5097.91135
4561.87305 52.6391471 5142.60729
4678.98604 52.1244402 5186.49084
4912.7788 46.8231402 5270.90287
4971.37529 47.8706165 5291.42975
5148.49002 43.8881078 5352.03497
5263.46733 48.9118325 5390.27192
5557.09022 46.146928 5484.25132
5614.98615 49.3017419 5502.19469
5966.3571 48.1895215 5607.27612
6141.33306 43.5094321 5657.31794
6316.76146 50.2959451 5706.07784
6668.07435 45.8654647 5799.77996
6726.08279 43.2453498 5814.77557
7019.00832 43.244978 5888.57638
7369.64871 43.2866343 5972.97075
8188.29812 43.2752391 6155.33208

Analysed on 2 PM Wednesday, March 20, 2002 GMT Standard Time - Flute

FFT sample analysed: 0.07033 Mb 0.09288 secs (4096 samples)

From recording of length: 2.268 secs
FFT bin size 10.77 Hz, Peak interpolation method used Mean value

Truncated from 0.1 secs to make the number of samples a power of two

.....................................................................

Better than Jain's method - this time just as a rough guess
looking at the figures, looks like the 95 % confidence interval
is less than 1Hz for a 0. 09 secs sample!

Robert

🔗paulerlich <paul@stretch-music.com>

3/20/2002 1:25:20 PM

--- In tuning@y..., "Robert Walker" <robertwalker@n...> wrote:
> HI Graham,
>
> Your remark about the gaussian got me thinking
> about distributions, and means, and I hit on a
> new idea for peak detection.
>
> The thing is that these are all three point methods
> - only use three data points. One wonders if using
> more than three would increase the accuracy - seems
> likely. Speed isn't a consideration here because
> one only need do the exact peak positioning
> for ones that are already selected as peaks
> (or at least, as I do it now, at a preliminary stage in the
> processing to find likely peaks).
>
> If you look at the peaks that get found using
> Jain's method, often you can see that they
> are to one side or the other of the position
> one would pick out by eye. It isn't so surprising
> as they use only three values to find the peak.
>
> So I had the idea of going down to, say, 5 %
> (or whatever) of the height of the peak to either
> side, treating the FFT values as the population,
> and finding the mean value.
>
> Also did a bit extra to cope with the fact that
> if you join the points (e.g. using straight lines
> as 1st approx) then the point at which the curve
> will cross the 5% value will be between two data
> points. Anyway I did something reasonable there.
>
> Here are the results:
> .........................................
>
> Fractal Tune Smithy Custom voice or FFT results As Timbre Partials
>
> Freq. Amp. (db) Cents
>
> 233.925368 99.7772512 0
> 292.481755 100 386.757972
> 350.908735 99.320157 702.057048
> 467.952108 85.0273321 1200.37508
> 584.983612 85.63987 1586.81747
> 701.925861 87.2793031 1902.3244

you're clearly doing far better this time. care to try the other
jerries?

🔗Robert Walker <robertwalker@ntlworld.com>

3/21/2002 4:18:31 PM

Hi Francoise,

I've tried out the calculation in FTS.

The method I used was to divide the frequency of the
partial by the appropriate integer:

Freq. Amp. (db) Cents

234.016057 100 0
....
1871.58289 60.7936076 3599.49542

1871.58289/8 - 234.016057
= 233.947861 - 234.016057
= -0.068196

So the higher partials will give increasingly more accurate values
for the pitch of the fundamental, assuming they are exact multiples
of the fundamental.

E.g if they are accurate to within 1 Hz, then on dividing by 8, you get
a value accurate to within 0.125 Hz.

When the higher partials are all exact, it might be an idea to weight
the count to preferentially use the higher partials. Maybe count the
nth partial n times?

Perhaps weighting them all evenly is a reasonable compromise between
preferring the higher ones as the more exact measurements, and preferring
the lowest ones as possibly more likely to be exact multiples of the
pitch for real timbres.

For the standard deviations, we need a good value for the fundamental
in order to tell if the high partials are accurate. If all the
partials were only known to 1 hz and one multiplied the measured fundamental
by 8 to compare it with the 8th partial, that introduces a possibly
9 Hz difference in frequencies between the two (8 Hz for the fundamental
and 1 Hz for the 8th partial).

So instead, I used the new adjusted value
for the fundamental, and assumed that is accurate enough to do
the job. This will at worst slightly over-estimate the SD if it
is inaccurate (by using an estimator for the mean that is offset from the
true value).

Another possibility would have been to instead divide each partial
by the appropriate integer to compare with the original fundamental, and
finding the SD of the differences of those instead, but that
would have seriously underestimated the SD. So I think this is
prob. the best way.

Here is the result for jerry00:

Search for harmonic timbres with leeway of 5 hz - 3 fundamentals found

Fundamental Mean diff New Freq. Cents SD Hz SD cents Partials

234.01606 -0.059567529 233.95649 0 0.07022305 0.5195594 29
292.52133 -0.073896311 292.44743 386.32448 0.047917301 0.28363822 19
351.03179 -0.088819712 350.94297 701.99563 0.076143648 0.3755829 10

Mean diff = mean diff from fundamental after dividing freq for nth partial by n
SD = for the differences after subtracting nth partial from n times new freq. for fundamental

Jerry01:

Fundamental Mean diff New Freq. Cents SD Hz SD cents Partials

234.016047 -0.0599432 233.956104 0 0.0738842 0.546644 21
295.393103 -0.0018082 295.391294 403.667329 0.0904079 0.5297827 22
351.03558 -0.0943342 350.941246 701.989977 0.0971456 0.4791643 17

Robert

🔗paulerlich <paul@stretch-music.com>

3/21/2002 4:23:49 PM

--- In tuning@y..., "Robert Walker" <robertwalker@n...> wrote:

> Here is the result for jerry00:
>
> Search for harmonic timbres with leeway of 5 hz - 3 fundamentals
found
>
> Fundamental Mean diff New Freq. Cents SD Hz SD
cents Partials
>
> 234.01606 -0.059567529 233.95649 0 0.07022305
0.5195594 29
> 292.52133 -0.073896311 292.44743 386.32448 0.047917301
0.28363822 19
> 351.03179 -0.088819712 350.94297 701.99563 0.076143648
0.3755829 10
>
>
> Mean diff = mean diff from fundamental after dividing freq for nth
partial by n
> SD = for the differences after subtracting nth partial from n
times new freq. for fundamental
>
> Jerry01:
>
> Fundamental Mean diff New Freq. Cents SD Hz SD
cents Partials
>
> 234.016047 -0.0599432 233.956104 0 0.0738842
0.546644 21
> 295.393103 -0.0018082 295.391294 403.667329 0.0904079
0.5297827 22
> 351.03558 -0.0943342 350.941246 701.989977 0.0971456
0.4791643 17
>
> Robert

looks like you're nailing it now. note that this does not violate the
classical uncertainty principle, since here it's *given* that the
signal is to be interpreted as frequencies unchanging over time
(while in many situations that wouldn't be the case), plus it's
*given* that the partials are exactly harmonic (while in many
situations *that* wouldn't be the case).

congratulations on your achievement here!

🔗LAFERRIERE François <francois.laferriere@cegetel.fr>

3/22/2002 1:38:47 AM

HI Robert

Robert Walker wrote:
> So the higher partials will give increasingly more accurate values
> for the pitch of the fundamental, assuming they are exact multiples
> of the fundamental.
>
> E.g if they are accurate to within 1 Hz, then on dividing by
> 8, you get
> a value accurate to within 0.125 Hz.

That exactly what I think.

> When the higher partials are all exact, it might be an idea to weight
> the count to preferentially use the higher partials. Maybe count the
> nth partial n times?

> Perhaps weighting them all evenly is a reasonable compromise between
> preferring the higher ones as the more exact measurements,
> and preferring
> the lowest ones as possibly more likely to be exact multiples of the
> pitch for real timbres.

That looks like an excellent idea!! I will probably try to use it on my
natural voice analysis. Up to now, I made no such weighting, I made a manual
confidence interval estimation based on the broadness of the peak, divide it
by N (the harmonic number) ant take the "most reliable". Perhaps the
weighting method you propose may be of some help to increase the accuracy of
my measurement.

>
> For the standard deviations, we need a good value for the fundamental
> in order to tell if the high partials are accurate. If all the
> partials were only known to 1 hz and one multiplied the
> measured fundamental
> by 8 to compare it with the 8th partial, that introduces a possibly
> 9 Hz difference in frequencies between the two (8 Hz for the
> fundamental
> and 1 Hz for the 8th partial).
>

Seemingly this really improved the accuracy of the results. I totally agree
with Paul Erlich:

> looks like you're nailing it now. note that this does not violate the
> classical uncertainty principle, since here it's *given* that the
> signal is to be interpreted as frequencies unchanging over time
> (while in many situations that wouldn't be the case), plus it's
> *given* that the partials are exactly harmonic (while in many
> situations *that* wouldn't be the case).
>
> congratulations on your achievement here!

I can only share Paul's enthousiasm for your excellent work on this.

François Laferrière