back to list

Barbershop Spectrogram

🔗Carl Lumma <clumma@xxx.xxxx>

1/10/1999 2:23:03 PM

In the spirit of Gary Morrison's suggestion, this post makes use of remote
resources posted at...

http://lumma.org/list/

The other day Dan Wolf recommended a spectrogram program and I downloaded
it. It works really well. It seems we have reason to be happy that we
live in a time when people can trade such valuable stuff without having to
bother with value abstraction. But I digress.

To test the program, I thought I'd try some Barbershop. For the test, I
picked a brief excerpt of song by the Happiness Emporium. This group did
most of their recordings (including this one, from their album "That's
Entertainment!") back in the 70's, when Barbershop technique wasn't a
shadow of what it is today. However, they have a nice sound, and I thought
I'd give them a try.

Using Cool Edit, I extracted the CD audio to 16-bit 44.1 WAV format, and
mixed it down from stereo to mono. I then ran it thru Spectrogram,
fiddling with the settings until I got a clear picture. The values used to
produce the image full.bmp are:

Attenuation=0
Palette=CB
Freq Scale=Log
FFT Size (points)=8192
Freq Resolution (hertz)=5.4
Band (hertz)=10-22050
Time Scale (ms)=12
Spectrum Average (ms)=1
Toggle Grid=off

This example contains a nice C7 chord, and I cropped it with Cool Edit and
spectrogrized it with the same options as above, except with a time scale
of 4 miliseconds.

Note: I downsampled full.wav to 22K with Cool Edit to make it smaller for
you all. Up.wav remains at 44K.

Note: When I save bmps with the grid on, the grid moves relative to the
rest of the graph. This is why the bmp files on my web site don't have
grids. Does this happen to you?

Then, I opened Notepad with the idea that I'd record the frequencies of the
first 9 peaks (starting at the bottom of the graph) in this chord that I
could get a signal strength of greater than -40 dbs out of. I picked 9
arbitrarily. I started at the bottom because I knew that's where most of
the parts would be.

I didn't take the readings from the same time on the graph. Rather, I took
each reading at the strongest point in each peak (it happened that all of
these fell very near each other, in third quarter or so of the total time).
The crosshairs have a resolution of 1cps, and I did some careful work with
the mouse. If there was a range of frequencies that shared the same
strength, I took the one closest to the middle of the range, taking the
higher frequency if the range was odd.

After I had done all that and closed Spectrogram, I turned by text file
into the chart in up.txt. I am at a loss to explain the results. With the
transform limited to a frequency resolution worse than 5 hertz, I should
not have gotten, nor did I expect to get, anywhere near the accuracy I did.

It may be noticed that I do not have the lead part labeled in the up.txt
chart. At first I thought that maybe the baritone was at peak3 and the
lead at peak4, with peak2 being 5-3. However, this would be an unusual
voicing, and listening reveals that the Baritone is singing peak2, and the
lead is singing a 5/2 above the bass. This peak is in the spectrogram
(up.bmp) but not the chart because I couldn't get more than -40 db's out of
it.

Carl

🔗Daniel Wolf <DJWOLF_MATERIAL@xxxxxxxxxx.xxxx>

1/10/1999 4:55:45 PM

Carl Lumma:

Which vowel is being sung on the chord you selected for the analysis? My
impression is that the accuracy of the intonation owes a lot to getting the
voicing of a given chord to support (i.e. 'ring') in the formant range of
the vowel.

Given the likely roots of Barbershop singing in the African-American
tradition, it might be interesting to look also at choral music in other
parts of the African diaspora -- I've heard Ghanean and Haitian choral
musics which share the tendency to 'ring' sustained chords.

I have been doing analyses of Javanese _rebab_ playing and solo vocal music
(_bawa_), so have not yet paid much attention to western intonation. I'd be
curious to see more analyses of choral and instrumental performance
comparing horizonal and vertical intonations. There are a lot of
assumptions out there (i.e. that strings tend 'naturally' to the
Pythagorean) that ought to be better documented.

The other shareware programs that I mentioned in my earlier posting were:

AcidWAV, which is part of Tommy Anderberg's GSound package (with WAVMaker
and other useful things), available at www.polyhedric.com

and

Tune!It, a handy little digital tuner, reasonably accurate in a pinch with
some nice features, by D. Volkmer, ftp://ftp.zeta.org.au/home/dvolkmer/ or
http://www.zeta.org.au/ftp/home/dvolkmer/

Another nice way to double check an analysis is to resynthesize and then
compare the original with the imitation. AcidWAV has an eight-oscillator
additive synthesis module (in addition to analog, FM, Karplus-Strong and a
freely drawn waveform modules) that works well when intervals of 1 Hz are
sufficient.

Dr. Daniel J. Wolf, Komponist
Material Press, Frankfurt am Main
DJWOLF_MATERIAL@COMPUSERVE.COM

🔗alves@xxxxx.xx.xxx.xxxxxxxxxxxxxxx)

1/11/1999 9:51:45 AM

>After I had done all that and closed Spectrogram, I turned by text file
>into the chart in up.txt. I am at a loss to explain the results. With the
>transform limited to a frequency resolution worse than 5 hertz, I should
>not have gotten, nor did I expect to get, anywhere near the accuracy I did.

The resolution of an FFT is determined by the window size (the length of
time which is analyzed) and the sampling rate. A discrete Fourier transform
transforms the time-domain data into the same number of data points in the
frequency and phase domains. Since we are interested in the frequency
domain, that leaves one-half the points to represent the available
frequencies (from 0 to the Nyquist frequency, one-half the sample rate).

Thus, to find the frequency resolution of a transform, divide the sampling
rate by the window size in samples. In your example, a 44100 sampling rate
divided by an 8192 window size leaves 5.4 hertz per frequency band. I agree
that it's not enough to do good pitch analysis, especially in the lower
range.

I used to do frequency analysis on a Synclavier, which allowed window sizes
up to 16384. Unfortunately, I have not found any other spectral analysis
tools which allow more than 8192 samples per window. One possible solution
would be to lower the sampling rate of your file. While this may seem
counter-intuitive, it means that a lot of the resolution is not wasted
analyzing the very high frequencies.

Bill

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^ Bill Alves email: alves@hmc.edu ^
^ Harvey Mudd College URL: http://www2.hmc.edu/~alves/ ^
^ 301 E. Twelfth St. (909)607-4170 (office) ^
^ Claremont CA 91711 USA (909)607-7600 (fax) ^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

🔗Daniel Wolf <DJWOLF_MATERIAL@xxxxxxxxxx.xxxx>

1/11/1999 10:24:55 AM

Bill Alves wrote:

"I used to do frequency analysis on a Synclavier, which allowed window
sizes
up to 16384. Unfortunately, I have not found any other spectral analysis
tools which allow more than 8192 samples per window. One possible solution
would be to lower the sampling rate of your file. While this may seem
counter-intuitive, it means that a lot of the resolution is not wasted
analyzing the very high frequencies."

Spectrogram _does_ allow a window size of 16384. I find it very useful to
narrow the frequency band and analyse a sample in approximately
octave-sized slices.

🔗Greg Schiemer <gregs@xxxx.xxxx.xxx.xxx>

1/12/1999 12:35:45 PM

>Another nice way to double check an analysis is to resynthesize and then
>compare the original with the imitation.

Daniel thanks for these pointers. I've already downloaded acid WAV. I expect success
with analysing vocal music will depend on the overtone structure of various sounds
singers make when they sing lyrics.

🔗Paul H. Erlich <PErlich@xxxxxxxxxxxxx.xxxx>

1/13/1999 12:49:41 PM

>Unfortunately, I have not found any other spectral analysis
>tools which allow more than 8192 samples per window.

Matlab has no limits on the window size, other than the constraints
imposed by your computer's resources. The number of samples doesn't even
have to be a power of two, although the operation is much faster when it
is a power of two.

🔗Carl Lumma <clumma@xxx.xxxx>

2/3/1999 10:00:58 PM

If you ever have a chance to trust a brand-new (2 months) top-of-the-line
9gig Seagate Cheetah Ultra2/LVD hard drive.... don't. In the event of a
physical head crash, you probably won't be comforted by the fact that
Seagate furnishes only factory serviced drives for replacement.

[Daniel Wolf]
>Which vowel is being sung on the chord you selected for the analysis? My
>impression is that the accuracy of the intonation owes a lot to getting the
>voicing of a given chord to support (i.e. 'ring') in the formant range of
>the vowel.

There's only one vowel known to Barbershop singers: "ah". Of course I
exaggerate, but not by very much. The vowel in the example in clearly "ah".

[Bill Alves]
>Thus, to find the frequency resolution of a transform, divide the sampling
>rate by the window size in samples. In your example, a 44100 sampling rate
>divided by an 8192 window size leaves 5.4 hertz per frequency band. I agree
>that it's not enough to do good pitch analysis, especially in the lower
>range.

Yes, I knew the resolution was "worse than 5 hertz" because I calculated
it. I did work with some Sarangi music using lower sampling rates, and
with Denkla's CSound Bach filtering for all but a narrow band of
frequencies, but I was complaining about (what appeared to be) too much
resolution in this case, not too little.

[Charlie Jordan]
>There is a 7/6 interval near the top. To put that interval over the bass
>line would create a 6:7:?:? chord rather than the 4:5:6:7 shown. The
>6:7:8:10 inversion is the least popular.
>
>Perhaps a separate analysis of each voice would be enlightening. Overtones
>and difference tones might be less prominent.

I'm not sure what you mean. You can't get "each voice" to perform in the
same way without the others. I identified all the parts, and the chord is
most definitely 2-3-5-7; standard Barbershop voicing.

Carl

🔗aloe@xxx.xxx

2/9/1999 11:21:07 PM

At 01:00 AM 2/4/99 -0500, Carl Lumma wrote:

>[Charlie Jordan]

>>Perhaps a separate analysis of each voice would be enlightening. Overtones
>>and difference tones might be less prominent.
>
>I'm not sure what you mean. You can't get "each voice" to perform in the
>same way without the others.

I suppose each voice could be recorded from a separate microphone.

>I identified all the parts, and the chord is
>most definitely 2-3-5-7; standard Barbershop voicing.

There is a peak at 579 hz. There's no measure of the relative contributions
to this peak of the 3rd harmonic of the bass (129 hz) and the 2nd harmonic
of the baritone (193 hz). This may or may not be useful to know.

The spectrogram is certainly informative. I appreciate being able to see the
analysis. Like most research, in answering one question, it asks others.

--Charlie Jordan <http://www.rev.net/~aloe/music>