back to list

Re: http://www.intelliscore.net/

🔗Carl Lumma <CLUMMA@NNI.COM>

5/7/2000 7:24:01 PM

>Interesting; have you tried out their demo?

Yes, I have (since my last message). I wasn't surprised, and I'm sure
you're not, to learn that it doesn't really work yet. I think I could get
better results by spending more time with it, but I also think (admittedly,
without spending that time) that I have an idea of the limits of such
improvement. For certain types of music, with a certain amount of touch-up
afterwards, you could get workable results. But for music like mine it's
just a no-go at this point. Ditto for any music of fast tempo, and/or
intricate rhythms. And even in ideal circumstances, I _don't_ buy the 35%
saved-time figure. Touch-up can be very time-consuming. Just like with
OCR (how's your OCR project coming, BTW?).

OTOH, I was just thinking the other day (yesterday, I think!) how far off
something even this good was. I am shocked and a-ghast. Now, I'd stop
before I ruling out the possibility that transcribing by ear could be
obsolete in 3-4 years.

-Carl

🔗Carl Lumma <CLUMMA@NNI.COM>

5/8/2000 7:19:59 AM

>1. When the software guesses wrong, a single mistake often has wide-ranging
>consequences. This suggests that you want to help it along, so that when
>it gets to a place where it isn't sure, you can prevent it from making the
>mistake.

With intelliscore, you have to tap the tempo, and declare the time
signature before the recognition starts.

>2. There are lots of places where the sound is simply ambiguous. You
>likewise want to be able to make decisions about this as part of the first
>pass.

Well, the recognition works in very nearly realtime on my single P2-400, so
you wouldn't have time on the first pass as it stands.

>One would be to synthesize a score with the extracted pitches, and create a
>second spectrogram, which you could compare by eye with the first.

The software, as it stands, creates a midi file, so it would simply amount
to feeding back the output from that into a second spectrogram. Trouble is,
synthesized timbres probably look a lot different from real timbres on a
spectrogram at this point. In fact, at this point, the software cannot tell
the difference between instruments -- its MIDI output is all on a single
track.

>3. There's a lot of variation in the performance of rhythm (and in what
>actually comes out of an instrument) that you want the software to ignore;
>listening, you can often know what's *meant* better than the software
>(though it may be absolutely right about what actually happened).

As it stands, the software is completely ignorant about rhythm. It seems
there are two general ways to proceed...

>6. In order to correct the transcription, you need to compare the
>transcription to the sound; the software ought to provide tools for doing
>this.
>
>What I'm imagining is something where the sound presented as a spectrogram,
>processed to highlight possible note onsets, etc. Overlaid would be a grid
>showing where pitches (in whatever tuning) would be. The software would
>guess at barlines, and you'd correct them. Likewise for beats within
>bars. Likewise for notes. Once the notes were in place, there'd be
>various tools for comparing the transcription to the original audio.

This would be a great tool. As far as I'm concerned, it wouldn't even have
to guess the rhythms. I can do that about as quickly as I can correct them.
The main thing would be the spectrogram display with note overlay, and an
good score entry window below that. In this setup, the computer simply
handles the absolute and relative pitch skills, which are generally
difficult for humans.

The second way to proceed would be to try and have the computer do
everything, with no touch-up afterwards. It was this possibility that I
said may be 3 years off. I can imagine an algorithm, whereby several meter
choices are generated concurently, and then fit to the sound source with a
tempo map (ala MOTU's FreeStyle). The tempo maps are then compared, and the
meter choice corresponding to the simplest tempo map wins.

~~

I would also be happy to see these kinds of tools, and I'm sure we won't
have too long to wait. OTOH, I am bound to ask which is more of an advance:
Proliferation of software that can transcribe music as well as humans, or
proliferation of what history and science have shown to be a very _human_
skill throughout our society? (I said above that absolute and relative
pitch were "difficult" for humans -- what did I mean by that?)

Sorry -- you aren't working on OCR to my knowledge. Rather, it was speech
recognition (which is probably even more analogous to the current topic than
OCR).

-Carl

🔗Allan Myhara <amyhara@mb.sympatico.ca>

5/8/2000 12:55:23 PM

> >Interesting; have you tried out their demo?
>
> Yes, I have (since my last message). I wasn't surprised, and I'm sure
> you're not, to learn that it doesn't really work yet. I think I could get
> better results by spending more time with it, but I also think (admittedly,
> without spending that time) that I have an idea of the limits of such
> improvement. For certain types of music, with a certain amount of touch-up
> afterwards, you could get workable results. But for music like mine it's
> just a no-go at this point. Ditto for any music of fast tempo, and/or
> intricate rhythms. And even in ideal circumstances, I _don't_ buy the 35%
> saved-time figure. Touch-up can be very time-consuming. Just like with
> OCR (how's your OCR project coming, BTW?).
>
> OTOH, I was just thinking the other day (yesterday, I think!) how far off
> something even this good was. I am shocked and a-ghast. Now, I'd stop
> before I ruling out the possibility that transcribing by ear could be
> obsolete in 3-4 years.
>
> -Carl

Try http://www.chat.ru/~andreenk/
I have an earlier version of their program (its called Widi, short for
Wave-to-Midi) that worked fairly well for transcribing bird calls and
the like, with a minimum of parameter twiddling. For complex music
though, you can only hear echoes of the original.
--
Bye for now

Allan Myhara
Winnipeg, Manitoba, Canada