back to list

FTS developments (was: Improvisation on pitches from a recording of a song)

🔗Yahya Abdal-Aziz <yahya@melbpc.org.au>

4/17/2005 8:17:25 PM

Robert,

You replied to Carl onlist as below; I've inserted
my thoughts inline after the marker [YA].

Regards,
Yahya

________________________________________________________________________
Date: Sat, 16 Apr 2005 11:11:53 +0100
From: "Robert Walker" <robertwalker@...>
Subject: Re: Improvisation on pitches from a recording of a song

Hi Carl,

> It never gives two simultaneous pitches, does it? If it doesn't,
> the volume part of the triple is not needed in technique.

No, you are right, it's strictly monophonic. I do have an frequency
spectrum type pitch tracking
option as well which could find multiple pitches. That one isn't that good
at finding the starts and ends of notes. However what you can do with FTS
is to edit the results after it finds the notes - can other wave to midi
programs do this?

[YA] Yes, it is a feature of the Professional version of Widisoft.
You can add, merge and suppress notes from analysis, tho I found the
editing interface is not the most intuitive ... The display is interesting
too, in that it tries to show both pitch and amplitude (or loudness).
Colours are user-definable, including those for bar-lines, which can be
set automatically or by user tapping.

FTS shows blue vertical lines on the recording for all the note boundaries
it found - and you can click anywhere to add or remove them,
then get it to refind all the frequencies for the notes again,
using just the note boundaries you have marked out yourself.

If you do that then the frequency spectrum (FFT) version of the note finding
is pretty good at getting the exact pitches also,
because it uses peak interpolation methods to refine the estimate of
the pitch at the peak. With various tweaks, it is now roughly comparable
with the exact wave count in its precision though the exact wave count has a
slight edge I think. Both it seems can find pitch accurate to a few
hundredths of a cent
or so for computer generated steady pitched notes of a second or so. The
wave counting routinely manages better than a hundredth of a cent for a one
second note if the waveform is suitable - e.g. computer generated exact
pitches of a regular waveform shapge.

So if you want exact pitches, and aren't too bothered about the exact
rhythm,
it makes sesnse to analyse rich timbres in FTS using FFT with user
interaction
to find the note boundaries.

So one could do that, and then it could find more than one simultaneous
pitch
as it could then find all the partials for all those
notes if one wanted them.

[YA] I think it was Widisoft that had FFT as its "basic" algorithm,
with a choice of three other "advanced" settings. Are they doing
anything you haven't tried?

At present it just tries to match a harmonic series
to them to find the note, and that part isn't particularly configurable yet,
you
just have to try it as it is. But you can get it to show its analysis of a
particular note - and later I can have a go at refining it somewhat. I
could at any rate easily get it to output all the simultaneous pitches it
finds
as it goes along the recording to make data for anyone who wants to analyse
them as they please.

But I don't have any plans at all to attempt polyphony, have made a design
decision in advance just not to attempt that. The task seems a hard one
- my hunch is that it is a sort of pattern recognition thing like
understanding speech or recognising faces which are both things that
are particularly hard to program. You need to reckon to devote all your
programming
and research time to the task for some yeaars if you expect to get anywhere
at
all in those areas I think.

[YA] You could well be right! 8-o I feel that perhaps neural net
techniques may prove more effective in recognising timbres, which
are of course subject to variations between individual instruments
of the same name, and the same instrument in different acoustical
settings.

The ear can hear a whole jumble of partials and instantly pick out that it
consists of say a basson, oboe and violin playing a chord, while a program
will have a much harder time of that because how would it tell which
partials belong with which instruments? Maybe by fitting harmonic series for
harmonic partials, but there may be some inharmonicity and the notes may be
very well also be played in near to harmonic series type harmonies with each
other. So, maybe really it would need a bank of templates of likely sounds
which it could then try and match against the recording, so you could tell
it that it uses such and such instruments, e.g. a guitar, or may have one in
it, and it could then look for the characterstic fingerprints of that
instrument.

[YA] Using a sampled instrument to provide templates for matching
might limit the effectiveness of recognition, because of the variations
I mentioned earlier. Do you have, for example, useful techniques for
filtering out the effects of room acoustics? I suspect that to do so
might lose the baby with the bathwater! Yet as you say, we humans
are adept at discriminating different components of the whole
soundscape, and assigning them to the correct instruments. Part of
this is doubtless due to familiarity - a learning effect - which is why
I think neural nets may give a workable approach. The best software
along these lines for, eg human face recognition is very good indeed.
Another part of our discriminatory ability is doubtless due to the
fact that we hear stereophonically, through two ears which have
systematic frequency response differences. Using these two
instruments together, differentially, enables us to get a 3D fix on
the source of a particular partial. I'd bet even with all your
experience of oboes, flutes, violas and violins, placed just SO in the
orchestra, if you (a) close your eyes and (b) block your right ear,
you'd find it much harder to discern when any one of them
contributed to a particular chord. Unless a pitch recognition system
uses stereo input data, can it hope to do as well as a trained human?

I think if one were ever to be presented with an orchestra consisting of
entirely unfamiliar instruments, it's possible it might take one quite a
while to learn to recognise them - if there were no familiar ones in the
orchestra
at all to get you started. Because the same partials could be divided up
in different ways. E.g. a flute and oboe in unison could just as easily
be a single "flute oboe" instrument say. Or e.g. the sound of a
car or a door creaking could be made up of a large number of instruments
playing quiet sounds, and I'm not sure, if one were unfamiliar with
doors, that one would know that that complex sound was a single thing rather
than the unison of several things played at once well synchronised.

So that's the situation that the computer program faces, no instruments
are familiar to it at all, unless you can figure out how to program it to
recognise them. It could listen to a bell and hear ten tuning forks
played at differnt pitches and volumes simultaneously.

[YA] I reckon you should just let it learn for itself!

I think when it comes to it once these polyphonic programs are a bit
more advanced they will surely have to build in experience of timbres
of real instruments into them in order to follow polyphonic lines
in complex musical textures. That's my hunch anyway.

Anyway it would be too much for me to attempt that as well, so yes, I'm
specialising on monophonic lines in FTS.

> >I don't do anything about weighting by the duration of the pitches,
> >but one could do that too, especially since longer duration pitches
> >are perhaps likely to be pitched more exactly, or heard with more
> >exactness of pitch too, I mean a grace note of just a hundredth of
> >a second for instance may not be so exactly pitched as a one second
> >note for instance.

> Indeed. And in fact, portamenti, glissandi, and even legato note
> transitions should not contribute to the scale analysis in my view.

Rightio. Another thing to bear in mind as well is that the
pitch detection is more accurate the longer the note is
so very short grace notes may not be recognised quite so
accurately by FTS. Depending that is on the quality of
the recording, the better the quality then the more easily
it will be able to detect shorter notes.

The bird song recordings I tested it with have all
been 8-bit just because that's what I found on the
web sites of bird song I tried, though I didn't look
very far, surely there must be other ones with higher
resolution recordings.

BTW I've just done some more tweaking and fixed a bug,
and as a result it is getting the robin song better
now, see what you think:

http://www.robertinventor.com/Robin_v2.mid

On Celeste for a bit of fun :-).

compare with the original 8 bit recording:
on this page:
http://www.scricciolo.com/eurosongs/canti.htm

European Robin Erithacus rubecula:

http://www.scricciolo.com/eurosongs/Erithacus.rubecula.wav

[YA] Thanks for this link!

The one thing FTS can't do at the moment is to deal
with repeated notes with very brief rests between them.
It just runs them all together treating the rest between
the notes as a bit of interference in the signal. The thing is that it
pays no attention to the amplitude particularly, except to ignore
all information below a threshold to deal with noise.

You notice that in the robin clip that the first high
note is just a single pitch rather than a repeated
one. It should repeat it twice, each time with a
rapid decay with a fast tremulo effect, if one
listens to the recording slowed down. The Celeste
helps there by being a short duration note at least
- it actually just plays a single long note through
all that part of the song so if you play it on e.g.
whistle it puts a lot of emphasis on the note
and it changes the perceived shape of the melody line
rather a lot at that point. So playing on
Celeste helps to make the melody line sound
more similar.

Actually it would be pretty easy to
just extract a volume envelope for the entire
recording and superimpose that on the pitches
played - play the pitches all at the same volume
on say a whistle, or oboe or whatever depending
on the bird, and then use the midi expression controller
on top of that to match the original volumes
of the recording exactly.

I may give that a go, which will get the
subtleties of tremulo at least though not
vibrato or glissandi of course, and it will deal with repeated
notes at least as far as the effect on the listener
is concerened. They use vibrato of course,
but not quite as much as one might expect,
many of the songs use just pitch glissandi
and then some tremulo now and again, or
alteratively a little in the way of vibrato.
The glissandi too aren't continuous, some
notes are discrete steady pitches, and
others are glissandi and if you listen to it
slowly, it really doesn't run that many
notes together either. It's quite a bit like
human speech, with phrases with gaps between
them.

It's clear that birds do have faster reactions than
humans and live a bit faster, and I wonder if
they could be so very much faster that they
can actually hear all those details in their songs
as they sing... It is interesting to speculate.
I don't know how one could find out,.

[YA] There's probably much that is essential,
and some that is accidental and unimportant, in
the songs and other vocalisations of any bird.
Based on Darwinian evolutionary theory, we should
expect each sound a bird makes to perform a useful
function that enhances its chances of survival.
Whether advertising for a mate with a characteristic
song; crying to scare off a predator (as we saw a
family of Australian magpies do yesterday to a
Lesser Australian Raven); or keeping in contact
with other members of family or flock; in each
case, only some part of the sound is essential, as we
can tell by the fact that variations of each are still
effective. ... So I think it's likely that birds CAN
hear almost anything we can extract from their
song with modern tools, at least down to a limit that
is, say, proportionate to their "rate of living". Maybe
a bird that lives seven years instead of seventy would
discriminate events ten times shorter? There's an
approximate "law" that each bird and animal has a
lifetime of about the same number of heartbeats.

Some birds have a lot of vibrato.
The curlew has a continuous vibrato,
and the meadow pipit also has a lot of vibrato
of the ones I've done.

BTW if you listen to the robin slowed down
it sings quite a few very steady discrete pitches though it does
have some legato / portamento type glissandi
- but lots of very steady notes amongst it.

[YA] I think that's where much new music is to be made -
with melodies that move VERY much faster than we're
used to. The implications for harmonic music are, of
course, that the harmonies ought to progress similarly
faster; if not, it's possible to write quite beautiful music
in a "melismatic" style, but that has the effect of separating
melody from accompaniment in a figure-ground way; not
always what one wants ... Imagine a classical orchestra
delivering homophonic music with harmonies changing
20 times a second! Maybe the players can't do it; maybe
our PCs can ...

> Plus, duration-based analysis could even tell us something about
> melodies played on keyboard instruments. There, it isn't needed to
> distinguish note transitions from scale tones, but it could tell us
> something about central vs. passing/auxillary tones -- even
> "diatonic" melodies rarely use all 7 notes equally.

> >So maybe one could take that into account. I could add an option to
> >FTS to print out all the pitches found as pitch / volume / duration
> >triples for anyone to analyse as they please using their own software
> >too.

> That would be great!

Done! I've done it so it saves them as a comma separated values
file so it shows up in database programs, and should be easy for
a program to read.

[YA] Wow! A responsive software supplier! :-)

> >Well it is visible in the interface but as three rows of
> >numbers - you could just use those too copy / paste and use them
> >as input to ones program.

> It does this now? What's the download url?

I'm just getting it ready now with the new changes and
will upload it and let you know when it is ready.

Done - I've just send the url to you privately.

Note to everyone else:

This update isn't ready for release quite yet.
it will be quite soon. Meanwhile
if you are very keen just ask and you can try
it out and see what you make of the feature.
However I like to know who is testing it out
at this stage.

Iit is definitely in a state of flux, this particular
feature at least, and may possibly change, may not work in quite
the same way when it comes to the release as it does
right now. You may spend some time tweaking the
settings to get it to work well with some particular
isntrument then with the next upload everything gets
chanaged and you have to start again, for instance.

But if you are keen to give it a go anyway let me know.
BTW it does install as a separate program so you
don't need to worry about it interfering with your
installation of FTS 2.4 if you have that already.
I plan to keep it like that for the release because
there has been quite a change in some sections and
some users may well want to run both programs
concurrently until they get used to 3.0.
Hopefully they will feel it has improved and
it is easier to find ones way around it
though with even more features there
is yet more to distract one on ones searh to
find out how to do some particular thing
:-).

Robert

[YA] Robert, doubtless at some time in the near
future I would like to try out your innovations.
But I think I'd better wait a little for the dust
to settle ... :-)

________________________________________________________________________

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.15 - Release Date: 16/4/05