back to list

monz FFT question

🔗monz <monz@tonalsoft.com>

1/20/2006 11:55:33 AM

OK, i'll just post my question here and welcome whatever
answers i can get.

I'm doing FFT (Fast Fourier Transform) spectral analysis of
sampled sounds (in the form of .wav files). My software
quantizes the specified frequency range into a number of
"buckets" (for lack of a better term) which must be
powers-of-2, and it will give maximum FFT resolution of
16384 (= 2^14) "buckets" into which it quantizes the specified
frequency range.

However, in order to get the 16384 resolution, my sampled
sound needs to be at least 16384 samples long. At the standard
sampling rate of 44100 samples per second, that translates into
a section of the .wav file which is 0.371519274 second long.

The problem is: i want to be able to analyze the spectrum
for much shorter periods of time, say around .01 second.

So i know that to do this i have to analyze two sections of
the .wav file which are .01 second apart, and find some kind
of "delta" (change) between them. I tried using a simple
average (adding together the two amplitudes for each "bucket"
and then dividing by 2), but i'm sure that that's not the
right way to do it.

I can't simply analyze sections of the .wav file which are
only 0.01 second long, because that's only 441 samples and
therefore the maximum resolution i can get is only 256 (= 2^8)
"buckets", and that's nowhere near fine enough.

So what's the proper method for doing this with the 16384
resolution? Thanks.

-monz
http://tonalsoft.com
Tonescape microtonal music software

🔗Herman Miller <hmiller@IO.COM>

1/20/2006 8:55:09 PM

monz wrote:
> OK, i'll just post my question here and welcome whatever
> answers i can get.
> > I'm doing FFT (Fast Fourier Transform) spectral analysis of
> sampled sounds (in the form of .wav files). My software > quantizes the specified frequency range into a number of
> "buckets" (for lack of a better term) which must be
> powers-of-2, and it will give maximum FFT resolution of
> 16384 (= 2^14) "buckets" into which it quantizes the specified
> frequency range.
> > However, in order to get the 16384 resolution, my sampled
> sound needs to be at least 16384 samples long. At the standard
> sampling rate of 44100 samples per second, that translates into
> a section of the .wav file which is 0.371519274 second long.
> > The problem is: i want to be able to analyze the spectrum
> for much shorter periods of time, say around .01 second.
> > So i know that to do this i have to analyze two sections of
> the .wav file which are .01 second apart, and find some kind
> of "delta" (change) between them. I tried using a simple
> average (adding together the two amplitudes for each "bucket"
> and then dividing by 2), but i'm sure that that's not the
> right way to do it.
> > I can't simply analyze sections of the .wav file which are
> only 0.01 second long, because that's only 441 samples and
> therefore the maximum resolution i can get is only 256 (= 2^8)
> "buckets", and that's nowhere near fine enough.
> > > So what's the proper method for doing this with the 16384
> resolution? Thanks.

There's an inherent tradeoff between frequency resolution and time resolution with FFT, so what you're looking for might be tricky. You can get some information about the time domain by looking at the phase. In particular, you can get a better estimate of the frequency of a wave by comparing the relative phase from adjacent "frames", and calculating what frequency would result in that phase difference over that length of time. Overlapping the frames is supposed to help with this. The results can also be improved by selecting an appropriate window for the FFT analysis.

🔗monz <monz@tonalsoft.com>

1/20/2006 9:23:42 PM

Hi Herman,

--- In tuning-math@yahoogroups.com, Herman Miller <hmiller@I...> wrote:

> There's an inherent tradeoff between frequency resolution
> and time resolution with FFT, so what you're looking for
> might be tricky. You can get some information about the
> time domain by looking at the phase. In particular, you can
> get a better estimate of the frequency of a wave by
> comparing the relative phase from adjacent "frames",
> and calculating what frequency would result in that phase
> difference over that length of time. Overlapping the
> frames is supposed to help with this. The results can
> also be improved by selecting an appropriate window
> for the FFT analysis.

Thanks. I am using the idea of overlapping frames, with
the averaging method i described. But i'm sure that
averaging is not giving me good results. It might work,
if i used a small enough time difference between the
overlapping frames, but that would entail too much work.

Can you tell me how to do the phase calculations you
describe? Can anyone else offer any help with this?

-monz
http://tonalsoft.com
Tonescape microtonal music software

🔗Carl Lumma <ekin@lumma.org>

1/20/2006 1:22:12 PM

At 11:55 AM 1/20/2006, you wrote:
>OK, i'll just post my question here and welcome whatever
>answers i can get.
>
>I'm doing FFT (Fast Fourier Transform) spectral analysis of
>sampled sounds (in the form of .wav files). My software
>quantizes the specified frequency range into a number of
>"buckets" (for lack of a better term) which must be
>powers-of-2, and it will give maximum FFT resolution of
>16384 (= 2^14) "buckets" into which it quantizes the specified
>frequency range.
>
>However, in order to get the 16384 resolution, my sampled
>sound needs to be at least 16384 samples long. At the standard
>sampling rate of 44100 samples per second, that translates into
>a section of the .wav file which is 0.371519274 second long.
>
>The problem is: i want to be able to analyze the spectrum
>for much shorter periods of time, say around .01 second.
>
>So i know that to do this i have to analyze two sections of
>the .wav file which are .01 second apart, and find some kind
>of "delta" (change) between them. I tried using a simple
>average (adding together the two amplitudes for each "bucket"
>and then dividing by 2), but i'm sure that that's not the
>right way to do it.
>
>I can't simply analyze sections of the .wav file which are
>only 0.01 second long, because that's only 441 samples and
>therefore the maximum resolution i can get is only 256 (= 2^8)
>"buckets", and that's nowhere near fine enough.
>
>So what's the proper method for doing this with the 16384
>resolution? Thanks.

There's no straightforward method -- you're running into the
classical uncertainty principle. You can use a moving, overlapping
window to try and spot what you're looking for, depending on what
that is. Wavelets are related way to balance the tradeoffs --
they're a generalized type of moving-window FFT (in a sense).

-Carl

🔗Graham Breed <gbreed@gmail.com>

1/21/2006 9:01:44 AM

monz wrote:

> Thanks. I am using the idea of overlapping frames, with
> the averaging method i described. But i'm sure that
> averaging is not giving me good results. It might work,
> if i used a small enough time difference between the
> overlapping frames, but that would entail too much work.

Getting hold of a text book would probably help. These are well-trodden paths. Sethares "Tuning, Timbre, Spectrum, Scale" has some details. The window is important. There's also a description of extracting the pitch of a signal in The Csound Book.

> Can you tell me how to do the phase calculations you
> describe? Can anyone else offer any help with this?

If you're trying to extract the pitch of a periodic wave, you can do better than the classical uncertainty principle would suggest. The Csound Book explains this, but unfortunately my copy is on another continent so I can't check it. A search for "heterodyning" might help, either on the web or in a library.

Graham

🔗Herman Miller <hmiller@IO.COM>

1/21/2006 9:07:20 AM

monz wrote:
> Hi Herman,
> > > --- In tuning-math@yahoogroups.com, Herman Miller <hmiller@I...> wrote:
> > >>There's an inherent tradeoff between frequency resolution
>>and time resolution with FFT, so what you're looking for
>>might be tricky. You can get some information about the
>>time domain by looking at the phase. In particular, you can
>>get a better estimate of the frequency of a wave by >>comparing the relative phase from adjacent "frames",
>>and calculating what frequency would result in that phase
>>difference over that length of time. Overlapping the
>>frames is supposed to help with this. The results can
>>also be improved by selecting an appropriate window
>>for the FFT analysis.
> > > > Thanks. I am using the idea of overlapping frames, with
> the averaging method i described. But i'm sure that
> averaging is not giving me good results. It might work,
> if i used a small enough time difference between the
> overlapping frames, but that would entail too much work.
> > Can you tell me how to do the phase calculations you
> describe? Can anyone else offer any help with this?

If anything, I'd think that averaging the amplitudes would blur the time resolution even more than the FFT inherently does on its own. But doing the phase calculation could allow you to use a shorter FFT window and still get enough information about the frequency to be useful. This technique has been used for voice compression, and has some application to pitch shifting algorithms as well -- search for "phase vocoder", "pitch shifting", "short time Fourier transform" and similar topics and see what you can dig up. Or as Carl Lumma suggested, you could look into wavelet transforms.

I'm not too familiar with the details, but basically you want to compare the phase difference you get from the analysis with the phase difference you'd expect to see. Each of the "bins" in the FFT responds most strongly to a frequency that's an integer multiple of a fundamental frequency, related to the length of the sample (2.69 Hz in your case). So a 440 Hz sine wave would produce the highest amplitude in bins 163 and 164, with frequencies of 438.74 Hz and 441.43 Hz respectively. But say that you have a .01 second delay between samples; in .01 second, the 440 Hz wave will have advanced 4.4 cycles, so the phase difference would be 0.4 * 2pi, while a 438.74 Hz wave would be expected to have a phase difference of 0.3874 * 2pi, and a 441.43 Hz wave would result in a phase difference of 0.4143 * 2pi.

🔗monz <monz@tonalsoft.com>

1/21/2006 1:35:40 PM

Hi Herman,

--- In tuning-math@yahoogroups.com, Herman Miller <hmiller@I...> wrote:

> If anything, I'd think that averaging the amplitudes would
> blur the time resolution even more than the FFT inherently
> does on its own.

Yes, that's what i think too.

> But doing the phase calculation could allow you to use
> a shorter FFT window and still get enough information about
> the frequency to be useful. This technique has been used
> for voice compression, and has some application to pitch
> shifting algorithms as well -- search for "phase vocoder",
> "pitch shifting", "short time Fourier transform" and similar
> topics and see what you can dig up. Or as Carl Lumma
> suggested, you could look into wavelet transforms.
>
> I'm not too familiar with the details, but basically you
> want to compare the phase difference you get from the
> analysis with the phase difference you'd expect to see.
> Each of the "bins" in the FFT responds most strongly to
> a frequency that's an integer multiple of a fundamental
> frequency, related to the length of the sample
> (2.69 Hz in your case).
>
> So a 440 Hz sine wave would produce the highest amplitude
> in bins 163 and 164, with frequencies of 438.74 Hz and
> 441.43 Hz respectively. But say that you have a .01 second
> delay between samples; in .01 second, the 440 Hz wave will
> have advanced 4.4 cycles, so the phase difference would
> be 0.4 * 2pi, while a 438.74 Hz wave would be expected to
> have a phase difference of 0.3874 * 2pi, and a 441.43 Hz
> wave would result in a phase difference of 0.4143 * 2pi.

Thanks for describing that.

The really good FFT software costs hundreds of dollars,
and i can't afford to spend that right now ... the software
i'm using only gives frequency and time data, and says
nothing about phase.

I fear that i'd have to study it a lot more to really get
it working, and i don't have time for that now. Maybe i'll
just go back to using the direct analysis values, and use
the 0.4-second frame size.

Thanks to everyone for the advice.

-monz
http://tonalsoft.com
Tonescape microtonal music software

🔗wallyesterpaulrus <perlich@aya.yale.edu>

2/6/2006 10:59:35 AM

Monz, you've run into the "classical" or "mathematical" uncertainty
principle. There's absolutely no way around it, by the very
definition of "frequency" and "time". I provided Yahya with some
links on this during my last session on the main tuning list -- I
suggest you study them carefully.

--- In tuning-math@yahoogroups.com, "monz" <monz@...> wrote:
>
> OK, i'll just post my question here and welcome whatever
> answers i can get.
>
> I'm doing FFT (Fast Fourier Transform) spectral analysis of
> sampled sounds (in the form of .wav files). My software
> quantizes the specified frequency range into a number of
> "buckets" (for lack of a better term) which must be
> powers-of-2, and it will give maximum FFT resolution of
> 16384 (= 2^14) "buckets" into which it quantizes the specified
> frequency range.
>
> However, in order to get the 16384 resolution, my sampled
> sound needs to be at least 16384 samples long. At the standard
> sampling rate of 44100 samples per second, that translates into
> a section of the .wav file which is 0.371519274 second long.
>
> The problem is: i want to be able to analyze the spectrum
> for much shorter periods of time, say around .01 second.

It's inherently impossible, indeed mathematically meaningless, to do
so with anything like the same resolution.

> So i know that to do this i have to analyze two sections of
> the .wav file which are .01 second apart, and find some kind
> of "delta" (change) between them.

?

> I tried using a simple
> average (adding together the two amplitudes for each "bucket"
> and then dividing by 2), but i'm sure that that's not the
> right way to do it.

Can you elaborate on what you did?

> I can't simply analyze sections of the .wav file which are
> only 0.01 second long, because that's only 441 samples and
> therefore the maximum resolution i can get is only 256 (= 2^8)
> "buckets", and that's nowhere near fine enough.