back to list

A new single-processing transform suited for speech-recognition

🔗WarrenS <warren.wds@gmail.com>

11/15/2011 1:55:28 PM

1st draft of a new paper by me:
http://dl.dropbox.com/u/3507527/SpeechProc.html

New transform in some sense unifies Fourier & Laplace transforms.
Also has "fast" versions ala FFT.
Also corresponds to simple model of what your Cochlea does.

Comments will be appreciated. The paper may still be a little flaky.
It seems important I think.

🔗WarrenS <warren.wds@gmail.com>

11/15/2011 1:55:53 PM

1st draft of a new paper by me:
http://dl.dropbox.com/u/3507527/SpeechProc.html

New transform in some sense unifies Fourier & Laplace transforms.
Also has "fast" versions ala FFT.
Also corresponds to simple model of what your Cochlea does.

Comments will be appreciated. The paper may still be a little flaky.
It seems important I think.

🔗Mike Battaglia <battaglia01@gmail.com>

11/15/2011 2:09:45 PM

This is brief because I don't have the time to explore this in full.
I've been independently exploring something similar for a while now,
although I haven't cobbled together anything formal yet, under two
different guises:

1) Using this to model cochlear interaction, as you said, for the
purposes of working with isoharmonicity buzz and Sethares-inspired
dissonance measures
2) Using this to model complex pitch perception, for the purposes of
working with harmonic entropy and Erlich and Terhardt-inspired
dissonance measures

The same basic concept applies in both cases, but it gets more
complicated with #2.

Some thought about what you wrote:
Y(L) = ln | fudge(L) ∫t<0 exp(0.0007 t e^L) exp(i e^L t) x(t) dt |

What is e^L? Is e^L the same thing as exp(L)?

Is this basically just the Laplace transform, but taken at a different
vertical line than the imaginary axis to introduce frequency
spreading, hence modeling cochlear effects? If so, and if you're
attempting to use this like a window for a one-sided Fourier
transform, why not just use something like a gammachirp window
instead, which was actually created to model the behavior of the
cochlea? See here:

http://people.csail.mit.edu/malex/research_files/park_wave.pdf

You don't have to use wavelets, but you can use that window to replace
your Laplace-inspired e^(at) window.

Also, I don't think this statement is true:

"Also, again, wavelet transforms involve some arbitrary discontinuous
chopping of the time axis. The cochlear transform does not – it is
invariant under time translation."

I think that's only true for the discrete wavelet transform, not the
continuous one. Also, there are plenty of ways to make the DWT
shift-invariant, including but not limited to this:
http://en.wikipedia.org/wiki/Stationary_wavelet_transform

-Mike

On Tue, Nov 15, 2011 at 4:55 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> 1st draft of a new paper by me:
> http://dl.dropbox.com/u/3507527/SpeechProc.html
>
> New transform in some sense unifies Fourier & Laplace transforms.
> Also has "fast" versions ala FFT.
> Also corresponds to simple model of what your Cochlea does.
>
> Comments will be appreciated. The paper may still be a little flaky.
> It seems important I think.

🔗Keenan Pepper <keenanpepper@gmail.com>

11/15/2011 3:19:59 PM

--- In tuning-math@yahoogroups.com, Mike Battaglia <battaglia01@...> wrote:
> Some thought about what you wrote:
> Y(L) = ln | fudge(L) ∫t<0 exp(0.0007 t e^L) exp(i e^L t) x(t) dt |
>
> What is e^L? Is e^L the same thing as exp(L)?

I'm sure it is. It says at the top that exp(x) is the same as e^x. And if L is log frequency then e^L is frequency.

> Is this basically just the Laplace transform, but taken at a different
> vertical line than the imaginary axis to introduce frequency
> spreading, hence modeling cochlear effects? If so, and if you're
> attempting to use this like a window for a one-sided Fourier
> transform, why not just use something like a gammachirp window
> instead, which was actually created to model the behavior of the
> cochlea? See here:

This is exactly my question on first looking at this paper - how is this any different in concept from a gammatone or gammachirp filter bank?

It might be because this transform is supposed to be intertible, but it doesn't seem to be.

Keenan

🔗Herman Miller <hmiller@prismnet.com>

11/15/2011 5:22:16 PM

On 11/15/2011 4:55 PM, WarrenS wrote:
> 1st draft of a new paper by me:
> http://dl.dropbox.com/u/3507527/SpeechProc.html
>
> New transform in some sense unifies Fourier& Laplace transforms.
> Also has "fast" versions ala FFT.
> Also corresponds to simple model of what your Cochlea does.
>
> Comments will be appreciated. The paper may still be a little flaky.
> It seems important I think.

It's not really my area of expertise, but from what I understand it isn't the linear nature of LPC that's an issue for speech recognition. The problem is that LPC has trouble with anti-resonances such as those in nasal consonants or nasalized vowels. (See Peter Ladefoged's _Elements of Acoustic Phonetics_ for an overview of LPC analysis for speech sounds, although it's written for readers with more of a background in linguistics than math, so you can probably find a better reference that goes more into the mathematical details.)

For tuning-math relevance, I'm interested in whether you might be able to use this transform to analyze the musical tuning of a recording, and whether it might be better than other approaches for this purpose. Having a logarithmic frequency response instead of linear like the Fourier transform seems like a useful property. Have you done any experiments along those lines?

🔗WarrenS <warren.wds@gmail.com>

11/16/2011 11:00:40 AM

> What is e^L? Is e^L the same thing as exp(L)?

--yes.

> Is this basically just the Laplace transform,

--yes except that s is complex not real.

> If so, and if you're
> attempting to use this like a window for a one-sided Fourier
> transform, why not just use something like a gammachirp window
> instead, which was actually created to model the behavior of the
> cochlea? See here:
>
> http://people.csail.mit.edu/malex/research_files/park_wave.pdf

--my preliminary impression is I've been scooped.
That is, their "gammachirp filterbank" thing is essentially the same thing
as my "cochlear transform." (Actually I invented this like 20 years ago, but
basically never told anybody... some in some sense maybe I scooped the scoop...)

However, I have fast algorithms and this park_wave
paper apparently does not, although maybe some other scooper does.

> You don't have to use wavelets, but you can use that window to replace
> your Laplace-inspired e^(at) window.

--don't know what you are saying there.

> Also, I don't think this statement is true:
>
> "Also, again, wavelet transforms involve some arbitrary discontinuous
> chopping of the time axis. The cochlear transform does not -- it is
> invariant under time translation."
>
> I think that's only true for the discrete wavelet transform, not the
> continuous one.

--oh. But maybe in practice you have to use DWT?

> Also, there are plenty of ways to make the DWT
> shift-invariant, including but not limited to this:
> http://en.wikipedia.org/wiki/Stationary_wavelet_transform

--oh. Well, the wikipedia article does not explain it worth a damn, but that does look interesting.

🔗WarrenS <warren.wds@gmail.com>

11/16/2011 11:06:54 AM

> It might be because this transform is supposed to be invertible, but it doesn't seem to be.

-- my cochlear transform is invertible in the same sense the laplace transform is invertible...
can do it using a contour integral using a vertical contour. However, for practical
purposes the Laplace transform is not invertible in the sense this inversion is numerically
very ill-conditioned. That is also true for the cochlear transform due to the exponential
dying-off "window" really clobbering certain dependencies which in principle still
are there mathematically, but in practice with finite precision arithmetic won't be.

So in practice you only will be able to invert it decently for times not too far in the past.

🔗Mike Battaglia <battaglia01@gmail.com>

11/16/2011 11:43:33 AM

On Wed, Nov 16, 2011 at 2:00 PM, WarrenS <warren.wds@gmail.com> wrote:
>
> > Is this basically just the Laplace transform,
>
> --yes except that s is complex not real.

What do you mean? In the Laplace transform, s in e^-st is supposed to
be complex.

> > If so, and if you're
> > attempting to use this like a window for a one-sided Fourier
> > transform, why not just use something like a gammachirp window
> > instead, which was actually created to model the behavior of the
> > cochlea? See here:
> >
> > http://people.csail.mit.edu/malex/research_files/park_wave.pdf
>
> --my preliminary impression is I've been scooped.
> That is, their "gammachirp filterbank" thing is essentially the same thing
> as my "cochlear transform." (Actually I invented this like 20 years ago, but
> basically never told anybody... some in some sense maybe I scooped the scoop...)
>
> However, I have fast algorithms and this park_wave
> paper apparently does not, although maybe some other scooper does.

There's also this:

http://www.audience.com/technology/fast-cochlea.php

but it's apparently proprietary.

> > You don't have to use wavelets, but you can use that window to replace
> > your Laplace-inspired e^(at) window.
>
> --don't know what you are saying there.

Just use an STFT with a gammachirp window.

> > Also, I don't think this statement is true:
> >
> > "Also, again, wavelet transforms involve some arbitrary discontinuous
> > chopping of the time axis. The cochlear transform does not -- it is
> > invariant under time translation."
> >
> > I think that's only true for the discrete wavelet transform, not the
> > continuous one.
>
> --oh. But maybe in practice you have to use DWT?

The DWT isn't just the discrete version of the CWT - they're very
different. It's not like the DFT vs the normal fourier transform.

> > Also, there are plenty of ways to make the DWT
> > shift-invariant, including but not limited to this:
> > http://en.wikipedia.org/wiki/Stationary_wavelet_transform
>
> --oh. Well, the wikipedia article does not explain it worth a damn, but that does look interesting.

I find Wikipedia articles on math to be terrible, in general, but it's
a neat concept.

-Mike

🔗WarrenS <warren.wds@gmail.com>

11/16/2011 1:00:52 PM

> > However, I have fast algorithms and this park_wave
> > paper apparently does not, although maybe some other scooper does.
>
> There's also this:
>
> http://www.audience.com/technology/fast-cochlea.php
>
> but it's apparently proprietary.

--well, holy crap. It isn't explained but the nifty picture they give looks exactly like mine.
Son of a bitch. Major miscellaneous annoyed words. I presume this is due to Lloyd Watts,
their "chief scientist" (?).

> The DWT isn't just the discrete version of the CWT - they're very
> different. It's not like the DFT vs the normal fourier transform.

--well, maybe you can educate me on wavelets. I read some wavelet papers a long time ago (about when they invented the idea) but not since.
But anyhow, it seemed to me that it was kind of a hybrid of discrete & continuous.

There was a magic function WVT(t) on the real interval [0,1], called "the wavelet,"
which had to be designed+chosen very carefully.
This function looked "rough," e.g it was only C2,
i.e. 2-time differentiable but almost-nowhere 3-time differentiable.
Actually, it would not surprise me if this could be done for Ck for any k, and maybe even Cinfinity, but I would conjecture it is not possible to use an analytic function, that's
asking for too much smoothness and you probably can't get it. Quite possibly all
this has been proven by others. Anyhow...

Then you decompose the time-axis
into a "binary tree of intervals" such as the integers at length=1,
the half integers at length=1/2, length =1/4, 1/8, 1/16 etc,
and in the other direction 2,4,8,16 etc.

Anyhow, then the Cool Theorem was that ANY function from some wide class
was representable uniquely as a sum of coefficients times
wavelets rescaled and shifted to lie on the intervals from that tree.

So that discrete tree explains what I had in mind by "arbitrary chopping of the time axis."
The "continuous" part of the hybrid is because we are talking about a continuous-time signal. For computer purposes you would also discretize time, making it a discrete+discrete hybrid.

Now I see in wikipedia
http://en.wikipedia.org/wiki/Continuous_wavelet_transform
there also something they call "Continuous wavelet transform"
but it just looks stupid, at least the way they describe it.

🔗Carl Lumma <carl@lumma.org>

11/16/2011 2:37:42 PM

Warren wrote:

> > http://www.audience.com/technology/fast-cochlea.php
> >
> > but it's apparently proprietary.
>
> --well, holy crap. It isn't explained but the nifty picture
> they give looks exactly like mine. Son of a bitch. Major
> miscellaneous annoyed words. I presume this is due to
> Lloyd Watts, their "chief scientist" (?).

A good friend of mine was their 2nd employee, and I've
met and corresponded with Watts on several occasions.
His cochlea is based on Dick Lyon's work at Apple in the
late '80s / early '90s. Watts implemented this cochlea
in an FPGA while at Interval Research. His thesis, under
Carver Mead, as well as various Interval papers, document
this work. There is really nothing secret about it. He
ported it to x86 in 2001, since Intel chips were by then
fast enough to do it in real time. He got Audience funded
by demoing that. Work since then has progressed well
beyond cochlear models. Their chip is in the iPhone 4
(and I imagine, the 4S too).

-Carl

🔗WarrenS <warren.wds@gmail.com>

11/17/2011 1:12:14 PM

I found apparently the only published paper by Watts on his speech processors (?) and have put an e-copy here:

https://dl-web.dropbox.com/get/Public/WattsVoiceProc2009.pdf?w=53b4c8a6

it is from IEEE Micro 29,2 (Mar-Apr 2009) 54-61.

It seems a lot less impressive-sounding than their commercial hype-webpage looked.

Apparently the "gammachirp filterbank" idea,
aka Mellin transform, aka Laplace transform with complex s
(the inversion is called "Bromwich integral" by the way)
was invented for speech signal processing purposes by
Roy D. Patterson and/or Toshio Irino.
I found these two papers by them:

J.Acoustical Soc. Amer. 101,1 (1997) 412-419
Speech Commun. 36,3-4 (2002) 181-203.

I do not have access to the second.
A quick look suggests none of these 3 papers recognize the existence
of fast algorithms, so maybe I'm actually not as scooped as I thought...

I also re-examined the wavelet literature and I still agree with
what I'd said before, actually. The wavelet literature seems to
contain a lot of red-herring garbage mixed in with a small amount of top-grade jewels, and for the jewels, what I said was valid; while for the garbage, who cares -- that stuff is best hermetically sealed for safe disposal and shouldn't be used for anything. (But, I realize, this statement would be disagreed with by a ton of authors.)