Yahoo Tuning Groups Ultimate Backup tuning-math HE, seeded with the right half of the Stern-Brocot tree

On Tue, Feb 1, 2011 at 11:34 PM, Carl Lumma <carl@lumma.org> wrote:
>
> Mike wrote:
> >So can we please stop saying that HE really only has one free parameter now?
> >
>
> No way Jose. First, I have no idea what I'm looking at.
> The overlaid curves look quite similar; evidently they don't
> include the Stern-Brocot curves. So what are they?

The overlaid curves are various optimizations of HE/DC/whatever. The
Stern-Brocot thing came later, as an afterthought. I didn't think
you'd really take it seriously. We're talking about a field of
attraction for 1/1 that's 250 cents wide.

> Second, you say you used an s-b tree to 20 levels. Really,
> you should run up the number of levels until the curve
> stabilizes, if it can be made to do so. Otherwise, the number
> of levels should be chosen to match what goes into the stable
> Tenney height curve -- somethin like the number of ratios
> admitted, or perhaps their average Tenney height.

There are 2 million ratios in this series. This is as stabilized as I
think that it's going to get. Think about the way the Stern-Brocot
tree adds ratios for an explanation of why 1/1's field of attraction
ends up so wide. I don't understand what you mean by "chosen to match
what goes into the stable Tenney height curve."

> Then we need plots of one curve against another (as I showed
> last) and/or plots of the minima vs. the Tenney height of their
> associated ratios.

So feel free to plot them against your data set. I don't think the
results will yield a line, and you've never told me why they should.
If the minima don't line up exactly, but are still ordered by Tenney
height, you won't get a line, but a bunch of lines intersecting at the
top right.

> Finally, on the overlay, the entropy at 1/1 is zero. What's
> the deal there? And it goes up to 1? On the s-b curve, it
> goes from 0 to 9...?

I've normalized the entropy so that it goes from 0 to 1 in all cases.
Looks like I screwed up the one for the SB curve.

> Postfinally, when you say you truncate the s-b tree to above
> 4/1... do you also truncate it below 1/1? What are you seeding
> it with? I would suggest seeding with 1/1 2/1, or if you insist
> on computing two octaves, 1/1 2/1 4/1.

I'm seeding it with 1/1 1/0, and then computing mediants forever,
pruning if they end up out of range, which is the same thing you're
doing.

-Mike

🔗Mike Battaglia <battaglia01@gmail.com>

🔗Carl Lumma <carl@lumma.org>

>The overlaid curves are various optimizations of HE/DC/whatever.

Yes, whatever are they?

>> Second, you say you used an s-b tree to 20 levels. Really,
>> you should run up the number of levels until the curve
>> stabilizes, if it can be made to do so. Otherwise, the number
>> of levels should be chosen to match what goes into the stable
>> Tenney height curve -- somethin like the number of ratios
>> admitted, or perhaps their average Tenney height.
>
>There are 2 million ratios in this series. This is as stabilized as I
>think that it's going to get. Think about the way the Stern-Brocot
>tree adds ratios for an explanation of why 1/1's field of attraction
>ends up so wide. I don't understand what you mean by "chosen to match
>what goes into the stable Tenney height curve."

There are 4218 ratios between 1 and 2 in a Tenney series of
limit 10,000. Their mean Tenney height is 5010. Paul seemed
to think the Tenney series version was stable at 10,000. It
would be interesting to see what the s-b version does as its
limit is increased. For that, stacked curves can work.

>> Then we need plots of one curve against another (as I showed
>> last) and/or plots of the minima vs. the Tenney height of their
>> associated ratios.
>
>So feel free to plot them against your data set.

How can I plot them if I don't even know what they are? You
dumped a bunch of xls files in a folder with cryptic names.

>> Postfinally, when you say you truncate the s-b tree to above
>> 4/1... do you also truncate it below 1/1? What are you seeding
>> it with? I would suggest seeding with 1/1 2/1, or if you insist
>> on computing two octaves, 1/1 2/1 4/1.
>
>I'm seeding it with 1/1 1/0, and then computing mediants forever,
>pruning if they end up out of range, which is the same thing you're
>doing.

Not quite. For the 1/1 2/1 4/1 seed, the rightmost quadrant of
the tree will be different. However that looks pathological so
forget I mentioned it.

-Carl

🔗Carl Lumma <carl@lumma.org>

🔗Mike Battaglia <battaglia01@gmail.com>

On Wed, Feb 2, 2011 at 4:46 PM, Carl Lumma <carl@lumma.org> wrote:
>
> >The overlaid curves are various optimizations of HE/DC/whatever.
>
> Yes, whatever are they?

I didn't want to get into the specifics at first, because the answer
is very long. These are a bunch of are successive approximations and
optimizations, each one also utilizing the optimization of the
previous ones, that alter what Paul is doing, but always in a way that
calculates -plogp. They are all approximations in the sense that the
sqrt(n*d) widths version is an approximation of the mediant-mediant
model. Sometimes it alters what Paul is doing in ways that don't
correlate with r=1 to the original curve, but always in ways that
preserve the locations of the minima and maxima and such.

They are a series of optimizations that prepares the curve for the
final optimization, which is the convolution. I didn't throw that one
in, which would be opt5, because as you know, I'm having trouble
figuring out which kernel corresponds to HE, if one even does at all.
It will probably end up taking a form where each interval has a
complexity sqrt(n*d + k) + l, where k and l are two constants that
work out to various terms from simplifying out the entropy summation,
which I have not yet managed to do.

So while the results are halfway between HE and DC, they all still
involve taking the actual entropy of something. Some of the
optimizations were determined mathematically, some empirically.

The "ent" one is Paul's HE with n*d<10000 mediant-mediant widths. If
you plot this against the sqrt(n*d) widths version, you won't get a
line. At least I don't, with the tabular data that's on
harmonic_entropy (called, I believe "lumma.txt".). You get something
kind of like a line, but at the top it diverges.

The "opt" one is my recoding of Paul's HE, but with two optimizations.
The first is that I don't bother seeding the model with any intervals
that lie two standard deviations away from the upper and lower bounds
of the cents values I want to calculate. So if I'm going from 0-1200
cents, and s is 1%, then I only use intervals where n*d < 10000 and
where -34 < 1200*log2(n/d) < 1234. The second optimization is that I
take advantage of the first identity in my proof, which is that G(i-d)
= G(d-i), which means you can just work out the integral beforehand
for each interval and get p_i(d) for each dyad a bit more quickly.
This correlates with ent perfectly, which was a bit unexpected given
the lossy nature of the first optimization.

"opt2" adds a second optimization, which is that rather than setting
the p_i(d) for each dyad as Integral(i_low,i_high,G_s(d)), I set it as
(i_high - i_low) * G_s(i-d). So, in other words, I did a
quasi-sqrt(n*d) approximation - rather than assuming that the width
was 1/sqrt(n*d) and multiplying by the value of the Gaussian at the
point of the interval i, I figured out what the actual width was, but
rather than integrating I just multiplied the result of that
calculation by the value of the Gaussian at i. This was derived from
observing how the integral changes as more intervals are added and
i_high - i_low tends to zero. This decorrelates the curve a little
bit, but r should approach 1 as N -> Infinity. We're preparing
ourselves for a convolution now, since the curve now consists of a
bunch of Gaussians *log(Gaussians) added together.

"opt3" adds a third optimization, which is my version of Paul's
sqrt(n*d) widths approximation. I replace (i_high-i_low) with
k/sqrt(n*d) for each interval, where k is a normalizing constant.
Different values of k will either make this work or not work. Paul's
approach was to work out all of the probabilities, and then
retrospectively pick k such that at each point, the probabilities sum
to 1. I wanted to work out an algebraic expression beforehand such
that it converges at infinity that all of the probabilities sum to 1
for all i. I wasn't sure what the exact expression would be, but it
seems to be asymptotic to the number of entries in the series for some
N. So I set k to L = length(series), and it turned out to be the only
value that worked, as well as 0.5L, 2L, etc, but never L^2 or sqrt(L).

This correlates to ent decently well, but less well than opt2. It will
probably converge to the sqrt(n*d) widths at N-> infinity.

"opt4" adds the final optimization, which is that I did all of the
above, but I seeded the calculation with a Farey series rather than a
Tenney series. This is because Farey series are much, much, much
quicker to compute. This means that you get the quickness of the Farey
series, but the slope of the Tenney series curve. This one was
trickier to work out and I did it almost entirely empirically.
Different values of N end up changing the slope, with N < 80 giving a
negative Farey-ish slope, with 80 giving a slope of about 0, and N >
80 giving a positive slope. I don't know why this is, but setting k as
above to length(series) produced the above behavior. The algebra for
the proper k is really over my head now, since I don't know what the
invariant measure for the Tenney series is to compare with that of the
Farey series, so I tweaked it a bit to try and give it a slope of 0
across the range of N. This took too long and I gave up and just went
with N=80. Correlates less well than the others, but the minima are as
always in the same place. The maxima shift sometimes by a few cents
here and there between all of these but are generally in the same
place.

> >There are 2 million ratios in this series. This is as stabilized as I
> >think that it's going to get. Think about the way the Stern-Brocot
> >tree adds ratios for an explanation of why 1/1's field of attraction
> >ends up so wide. I don't understand what you mean by "chosen to match
> >what goes into the stable Tenney height curve."
>
> There are 4218 ratios between 1 and 2 in a Tenney series of
> limit 10,000. Their mean Tenney height is 5010. Paul seemed
> to think the Tenney series version was stable at 10,000. It
> would be interesting to see what the s-b version does as its
> limit is increased. For that, stacked curves can work.

I don't know, but I don't think we'll ever be able to run that test.
It took me a half an hour to run this one. Maybe Colby can hook this
one up for us with his fancy supercomputer. I've run it before with
different values and the curve generally looks the same as the one I
posted. Again, think about it - intervals right next to 1/1,
cents-wise, will take the longest to appear, since every successive
level just brings you one mediant closer to 1/1, and the mediants are
always skewed -away- from 1/1. By the time any intervals around 50
cents appear at all, you have a quarter of a million intervals closer
to 9/8.

-Mike

🔗Carl Lumma <carl@lumma.org>

Sounds interesting, but I'm afraid it's 0a bit of tl;dr and a bit
of WTF for me. I will bow out, with only words of encouragement.

-Carl

At 08:32 PM 2/2/2011, you wrote:
>On Wed, Feb 2, 2011 at 4:46 PM, Carl Lumma <carl@lumma.org> wrote:
>>
>> >The overlaid curves are various optimizations of HE/DC/whatever.
>>
>> Yes, whatever are they?
>
>I didn't want to get into the specifics at first, because the answer
>is very long. These are a bunch of are successive approximations and
>optimizations, each one also utilizing the optimization of the
>previous ones, that alter what Paul is doing, but always in a way that
>calculates -plogp. They are all approximations in the sense that the
>sqrt(n*d) widths version is an approximation of the mediant-mediant
>model. Sometimes it alters what Paul is doing in ways that don't
>correlate with r=1 to the original curve, but always in ways that
>preserve the locations of the minima and maxima and such.
>
>They are a series of optimizations that prepares the curve for the
>final optimization, which is the convolution. I didn't throw that one
>in, which would be opt5, because as you know, I'm having trouble
>figuring out which kernel corresponds to HE, if one even does at all.
>It will probably end up taking a form where each interval has a
>complexity sqrt(n*d + k) + l, where k and l are two constants that
>work out to various terms from simplifying out the entropy summation,
>which I have not yet managed to do.
>
>So while the results are halfway between HE and DC, they all still
>involve taking the actual entropy of something. Some of the
>optimizations were determined mathematically, some empirically.
>
>The "ent" one is Paul's HE with n*d<10000 mediant-mediant widths. If
>you plot this against the sqrt(n*d) widths version, you won't get a
>line. At least I don't, with the tabular data that's on
>harmonic_entropy (called, I believe "lumma.txt".). You get something
>kind of like a line, but at the top it diverges.
>
>The "opt" one is my recoding of Paul's HE, but with two optimizations.
>The first is that I don't bother seeding the model with any intervals
>that lie two standard deviations away from the upper and lower bounds
>of the cents values I want to calculate. So if I'm going from 0-1200
>cents, and s is 1%, then I only use intervals where n*d < 10000 and
>where -34 < 1200*log2(n/d) < 1234. The second optimization is that I
>take advantage of the first identity in my proof, which is that G(i-d)
>= G(d-i), which means you can just work out the integral beforehand
>for each interval and get p_i(d) for each dyad a bit more quickly.
>This correlates with ent perfectly, which was a bit unexpected given
>the lossy nature of the first optimization.
>
>"opt2" adds a second optimization, which is that rather than setting
>the p_i(d) for each dyad as Integral(i_low,i_high,G_s(d)), I set it as
>(i_high - i_low) * G_s(i-d). So, in other words, I did a
>quasi-sqrt(n*d) approximation - rather than assuming that the width
>was 1/sqrt(n*d) and multiplying by the value of the Gaussian at the
>point of the interval i, I figured out what the actual width was, but
>rather than integrating I just multiplied the result of that
>calculation by the value of the Gaussian at i. This was derived from
>observing how the integral changes as more intervals are added and
>i_high - i_low tends to zero. This decorrelates the curve a little
>bit, but r should approach 1 as N -> Infinity. We're preparing
>ourselves for a convolution now, since the curve now consists of a
>bunch of Gaussians *log(Gaussians) added together.
>
>"opt3" adds a third optimization, which is my version of Paul's
>sqrt(n*d) widths approximation. I replace (i_high-i_low) with
>k/sqrt(n*d) for each interval, where k is a normalizing constant.
>Different values of k will either make this work or not work. Paul's
>approach was to work out all of the probabilities, and then
>retrospectively pick k such that at each point, the probabilities sum
>to 1. I wanted to work out an algebraic expression beforehand such
>that it converges at infinity that all of the probabilities sum to 1
>for all i. I wasn't sure what the exact expression would be, but it
>seems to be asymptotic to the number of entries in the series for some
>N. So I set k to L = length(series), and it turned out to be the only
>value that worked, as well as 0.5L, 2L, etc, but never L^2 or sqrt(L).
>
>This correlates to ent decently well, but less well than opt2. It will
>probably converge to the sqrt(n*d) widths at N-> infinity.
>
>"opt4" adds the final optimization, which is that I did all of the
>above, but I seeded the calculation with a Farey series rather than a
>Tenney series. This is because Farey series are much, much, much
>quicker to compute. This means that you get the quickness of the Farey
>series, but the slope of the Tenney series curve. This one was
>trickier to work out and I did it almost entirely empirically.
>Different values of N end up changing the slope, with N < 80 giving a
>negative Farey-ish slope, with 80 giving a slope of about 0, and N >
>80 giving a positive slope. I don't know why this is, but setting k as
>above to length(series) produced the above behavior. The algebra for
>the proper k is really over my head now, since I don't know what the
>invariant measure for the Tenney series is to compare with that of the
>Farey series, so I tweaked it a bit to try and give it a slope of 0
>across the range of N. This took too long and I gave up and just went
>with N=80. Correlates less well than the others, but the minima are as
>always in the same place. The maxima shift sometimes by a few cents
>here and there between all of these but are generally in the same
>place.

🔗Carl Lumma <carl@lumma.org>

🔗Mike Battaglia <battaglia01@gmail.com>

On Wed, Feb 2, 2011 at 11:45 PM, Carl Lumma <carl@lumma.org> wrote:
>
> Sounds interesting, but I'm afraid it's 0a bit of tl;dr and a bit
> of WTF for me. I will bow out, with only words of encouragement.

Haha OK, I'll sum everything up in plain English as follows:

1) I object to using cross-correlations to HE to validate the
convolution's usefulness as an approximation. This is because, while
cross-correlations in general are a good tool to see how data syncs up
with other data, I don't think anyone is using the HE curve like that
- we're more concerned with the locations of the maxima and the minima
and their ordering, not the locations in between. The maxima and
minima are always in the same spot with both models and the minima are
always ordered by Tenney height, but the spot at which a detuned 3/2
becomes equivalent in entropy to a just 5/4 changes between the two
models.

As you know, I am all about the theory whereby a sharpened 3/2 can
become as minor sounding as a 6/5, but I don't think that HE predicts
where that point happens. If it does, then I have been missing out on
an entirely different world of interpretation, and will be the first
to ditch this and try something new. If not, then the convolution
model serves as an ultra-fast and useful approximation to all of the
features of HE that are themselves psychoacoustically justified.

2) I am going to change the term "Distributed Complexity" to be a
simple descriptor referring to any psychoacoustic model that ends up
distributing the consonance of an interval out in pitch space by some
measure of complexity. It is not necessary do this with the same
spreading function for each interval - if so, then you have a
convolution-based model, and if not, then you don't. HE is a type of
model that ends up doing this, and is currently the most theoretically
justfied (maybe). What I've been calling DC I will just call a
convolution-based DC model, and it will be a type of "DC" model.

3) For the convolution model I used to call "DC," I am going to leave
the question of whether or not the -plogp actually works out to the
linear superposition of scaled Gaussians in the infinite case as an
open hypothesis. If this is false, then by approximating -plogp as a
piecewise linear function, we can asymptotically approach HE by
performing one convolution per "piece" and adding them together, with
more pieces yielding a curve that converges to HE. If it's true, then
there is some kernel that yields HE when you convolve it with a
Gaussian.

Right now, as it stands, if I seed the model with sqrt(n*d) heights
instead of n*d heights, the maxima get screwed up vs HE, so there are
bounds on how well the approximation will work. If the hypothesis is
true then some term can be added to the sqrt(n*d) complexity to
correct this, and if not then it can't.

-Mike