back to list

Bayesian tuning

🔗funwithedo19 <nielsed@uah.edu>

5/22/2011 2:04:28 AM

Bayesian tuning

This regards a Bayesian tuning I mentioned on the tuning list. Gene & Mike B suggested this list as a more suitable forum, so I'm restating it here more succinctly. In "tuning", I'd also talked about a related tuning called Kaneko, but it's a secondary notion that won't be mentioned here.

The Bayes theorem is the founding principle for understanding a large set of methods used in scientific investigation, or, it might be argued, any informative cognitive judgment made at all (cf. E.T. Jaynes, Larry Bretthorst, Harold Jeffreys, et al). The theorem is closely related to psychological principles such as the Weber law (and more-specialized principles such as the Hick, Ricco, etc laws).

While there are many interesting ways to use and to interpret the Bayes theorem, I will talk about one particular usage of the theorem in one particular case, the iterative binary Bayes. The binary case is perhaps the most useful in general and the case that best shows association with the Weber law and other psychological judgment laws. Two-hypothesis testing just consists of two exhaustive, mutually exclusive possibilities that are to be chosen between and with a likelihood that can be determined from these hypothetical statements. Sampling is done to test them, which should be random enough, of course, to be meaningful.

Even within this one simple case, there are different possibilities available. One could use symmetric or asymmetric forms of the theorem. Also, certain variables might be held constant or considered to change with time, affecting which evaluation methods are considered best. We will consider a typical approach while defining this tuning, and I will state assumptions along the way.

A common form of the binary Bayes formula is to evaluate posterior probability as

P(R|S) = P(R) P(S|R) / P(S) = P(R) / ( P(R) + P(~R) Lmb(S) ).

where "~"="not", "|"="given that", and Lmb(S) is a "likelihood ratio" = P(S|~R) / P(S|R).
This is actually the reciprocal of the common definition for likelihood ratio, but I use it because it tends to make things a little nicer to write.

Another typical approach is to use the posterior odds

O(R|S) = P(R|S) / P(~R|S) = [ P(R) P(S|R) / P(S) ] / [ P(~R) P(S|~R) / P(S) ] = O(R) / Lmb(S).

Taking the log allows us to say something very nice and elegant:

log O(R|S) = log O(R) – log Lmb(S).

log O(R) concerns the prior odds, and log Lmb(S) concerns the evidential data gathered (and its interpretation). If we assume that all samples are logically independent from one another and represent equal amounts of information, then we can consider Lmb(S) simply to be a constant. This is nice, because it allows us to take nice constant-sized steps toward one or another hypothesis on each iteration. Actually, there will be two constant step sizes - one if we are walking toward hypothesis A and another if we are moving toward B (cf. http://www-biba.inrialpes.fr/Jaynes/cc04q.pdf).

DEFINING THE TUNING

Assume we are performing a given number N iterations.

Assume initial prior probability P(R) = 1/2, and P(~R) = 1/2.

Also assume that, of our two hypotheses A and B, every single piece of evidence gathered points to one of them – say A.

This leaves us only one parameter we can adjust to affect our results - our likelihood ratio Lmb(S). This would be defined presumably by our hypotheses, but we will assume here that we have control over writing our likelihood guess into our hypotheses A and B.

Let us say that after N=13 iterations, P(R|S) should equal 1/4 (half of the initial probability). We can find a Lmb(S) that produces this result:

First, we note that on a single iteration,

P(R|S) = P(R) / ( P(R) + P(~R) Lmb(S) ) = 1 / ( 1 + Lmb(S) / O(R) ).

Without making proof (since it is readily proven; cf. http://en.wikipedia.org/wiki/Bayesian_inference#Evidence_and_changing_beliefs), I'll state that, in this case,

P(R|S) = 1/4 = 1 / ( 1 + [Lmb(S)]^13 / O(R) ),

and then

Lmb(S) = ( 3 O(R) )^(1/13) = ( 3 P(R) / (1-P(R)) )^(1/13)

Now consider the probabilities that result from each step in this iterative process (in cents):
-1200.0
-1274.697152144154
-1352.47887210531
-1433.328721208953
-1517.219560086023
-1604.113928195837
-1693.964559363352
-1786.715018493686
-1882.300441673644
-1980.648359747189
-2081.679584212231
-2185.309133925829
-2291.447181567758
-2400.0

We see that we have defined the octave range we sought. It might be interesting to see what happens were we to invert the likelihood ratio and run the same steps:
-1200.0
-1128.392921308355
-1059.870410433712
-994.4160287015554
-932.0026367428263
-872.5927740168412
-816.1391743485577
-762.5854026430926
-711.8665949872516
-663.9102822249965
-618.6372758542402
-575.9625947320386
-535.7964115381681
-498.0449991346121

Here we have defined a fifth in the opposing direction.

Consider what happens if we subtract the first set of values from the second (1200*(log P2 – log P1), or, equivalently, 1200*log(P2/P1)):
0.0
146.3042308357992
292.6084616715984
438.9126925073974
585.2169233431964
731.5211541789956
877.8253850147947
1024.129615850594
1170.433846686393
1316.738077522192
1463.042308357991
1609.34653919379
1755.650770029589
1901.955000865388

Now we have defined 13-ED3 (i.e. equal BP). In general we will always produce N-EDn, initially given some chosen constants N and n.

____________

BASIC PROGRAM BAYESTUN.BAS
`A Bayesian sequence in cents assuming all evidence is of same kind
DEFDBL A-Z
CLS
CONST recip = 1 'Set to -1 for inverse likelihood ratio
CONST offset = 0 'Set to 1200 to begin values at 0 cents
p = 1 / 2
lmb = (3 * p / (1 - p)) ^ (recip * 1 / 13)
FOR i = 0 TO 13
PRINT 1200 * LOG(p) / LOG(2) + offset
p = 1 / (1 + lmb * (1 - p) / p)
NEXT i

🔗Daniel Nielsen <nielsed@uah.edu>

5/22/2011 1:03:51 PM

This program takes N and m and gives the cents listings in file "out.txt".

______________________

'BAYESTU2.BAS

'Produces N-EDm
CONST N = 13
CONST m = 3
CONST offset = 0 'Set to 1200 to begin values at 0 cents

DEFDBL A-Z
OPEN "out.txt" FOR OUTPUT AS #1
CLS
PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
P<1/Lmb> P<Lmb>-P<1/Lmb>"
p = 1 / 2
p2 = 1 / 2
c = (m + 1) / 2
lmb = ((2 * c - 1) * p / (1 - p)) ^ (1 / N)
FOR i = 0 TO 13
x = 1200 * LOG(p) / LOG(2) + offset
x2 = 1200 * LOG(p2) / LOG(2) + offset
PRINT USING "########.####"; x; x2; x2 - x
PRINT #1, USING "########.####"; x; x2; x2 - x
p = 1 / (1 + lmb * (1 - p) / p)
p2 = 1 / (1 + (1 / lmb) * (1 - p2) / p2)
NEXT i
CLOSE #1

🔗Daniel Nielsen <nielsed@uah.edu>

5/22/2011 1:40:47 PM

Made a superficial typo in the program! FIXED BELOW

___________

'BAYESTU2.BAS

'Produces N-EDm
CONST N = 13
CONST m = 3
CONST offset = 0 'Set to 1200 to begin values at 0 cents

DEFDBL A-Z
OPEN "out.txt" FOR OUTPUT AS #1
CLS
PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
P<1/Lmb> P<1/Lmb>-P<Lmb>" '<-FIXED TYPO HERE
p = 1 / 2
p2 = 1 / 2
c = (m + 1) / 2
lmb = ((2 * c - 1) * p / (1 - p)) ^ (1 / N)
FOR i = 0 TO 13
x = 1200 * LOG(p) / LOG(2) + offset
x2 = 1200 * LOG(p2) / LOG(2) + offset
PRINT USING "########.####"; x; x2; x2 - x
PRINT #1, USING "########.####"; x; x2; x2 - x
p = 1 / (1 + lmb * (1 - p) / p)
p2 = 1 / (1 + (1 / lmb) * (1 - p2) / p2)
NEXT i
CLOSE #1

🔗Daniel Nielsen <nielsed@uah.edu>

5/22/2011 11:31:18 PM

Promise I won't post any more corrections of this stupid thing, but ANOTHER
TYPO FIXED BELOW:

'BAYESTU2.BAS

>
> 'Produces N-EDm
> CONST N = 13
> CONST m = 3
> CONST offset = 0 'Set to 1200 to begin values at 0 cents
>
> DEFDBL A-Z
> OPEN "out.txt" FOR OUTPUT AS #1
> CLS
> PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
> P<1/Lmb> P<1/Lmb>-P<Lmb>"
> p = 1 / 2
> p2 = 1 / 2
> c = (m + 1) / 2
> lmb = ((2 * c - 1) * p / (1 - p)) ^ (1 / N)
> FOR i = 0 TO N '<---NOT 13 (duh)
> x = 1200 * LOG(p) / LOG(2) + offset
> x2 = 1200 * LOG(p2) / LOG(2) + offset
> PRINT USING "########.####"; x; x2; x2 - x
> PRINT #1, USING "########.####"; x; x2; x2 - x
> p = 1 / (1 + lmb * (1 - p) / p)
> p2 = 1 / (1 + (1 / lmb) * (1 - p2) / p2)
> NEXT i
> CLOSE #1
>
>

🔗genewardsmith <genewardsmith@sbcglobal.net>

5/23/2011 9:51:41 AM

--- In tuning-math@yahoogroups.com, Daniel Nielsen <nielsed@...> wrote:
>
> Promise I won't post any more corrections of this stupid thing, but ANOTHER
> TYPO FIXED BELOW:

I don't get it: neither x nor x2 seem to depend on i.

🔗Daniel Nielsen <nielsed@uah.edu>

5/23/2011 10:37:51 AM

Due to a cool equivalence, there are (at least) two ways this could have
been written: one dependent on i and one iterative (recursive). I wrote it
iteratively in the program.

Dependent on i:

P_i = 1 / ( 1 + Lmb^i / O_0 )

Iterative:

P_i = 1 / ( 1 + Lmb / O_(i-1) )

where odds O_k = P_k / (1 - P_k)

Here it is dependent on i:

'BAYESTU3.BAS

'Produces N-EDm
CONST N = 13
CONST m = 3
CONST offset = 0 'Set to 1200 to begin values at 0 cents

DEFDBL A-Z
OPEN "out.txt" FOR OUTPUT AS #1
CLS
PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
P<1/Lmb> P<1/Lmb>-P<Lmb>"
pinit = 1 / 2: p2init = 1 / 2
c = (m + 1) / 2
lmb = ((2 * c - 1) * pinit / (1 - pinit)) ^ (1 / N)
FOR i = 0 TO N
p = 1 / (1 + lmb ^ i * (1 - pinit) / pinit)
p2 = 1 / (1 + (1 / lmb) ^ i * (1 - p2init) / p2init)
x = 1200 * LOG(p) / LOG(2) + offset
x2 = 1200 * LOG(p2) / LOG(2) + offset
PRINT USING "########.####"; x; x2; x2 - x
PRINT #1, USING "########.####"; x; x2; x2 - x
NEXT i
CLOSE #1

🔗Daniel Nielsen <nielsed@uah.edu>

5/23/2011 11:26:51 AM

Don't ask me why I wrote

c = (m + 1) / 2
lmb = ((2 * c - 1) * pinit / (1 - pinit)) ^ (1 / N)

instead of

lmb = ((m * pinit / (1 - pinit)) ^ (1 / N)

I think there was some reason, but it doesn't matter now.

Dan N

On Mon, May 23, 2011 at 12:37 PM, Daniel Nielsen <nielsed@uah.edu> wrote:

> Due to a cool equivalence, there are (at least) two ways this could have
> been written: one dependent on i and one iterative (recursive). I wrote it
> iteratively in the program.
>
> Dependent on i:
>
> P_i = 1 / ( 1 + Lmb^i / O_0 )
>
> Iterative:
>
> P_i = 1 / ( 1 + Lmb / O_(i-1) )
>
> where odds O_k = P_k / (1 - P_k)
>
> Here it is dependent on i:
>
> 'BAYESTU3.BAS
>
> 'Produces N-EDm
> CONST N = 13
> CONST m = 3
> CONST offset = 0 'Set to 1200 to begin values at 0 cents
>
> DEFDBL A-Z
> OPEN "out.txt" FOR OUTPUT AS #1
> CLS
> PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
> P<1/Lmb> P<1/Lmb>-P<Lmb>"
> pinit = 1 / 2: p2init = 1 / 2
> c = (m + 1) / 2
> lmb = ((2 * c - 1) * pinit / (1 - pinit)) ^ (1 / N)
> FOR i = 0 TO N
> p = 1 / (1 + lmb ^ i * (1 - pinit) / pinit)
> p2 = 1 / (1 + (1 / lmb) ^ i * (1 - p2init) / p2init)
> x = 1200 * LOG(p) / LOG(2) + offset
> x2 = 1200 * LOG(p2) / LOG(2) + offset
> PRINT USING "########.####"; x; x2; x2 - x
> PRINT #1, USING "########.####"; x; x2; x2 - x
> NEXT i
> CLOSE #1
>
>

🔗Daniel Nielsen <nielsed@uah.edu>

5/23/2011 3:07:06 PM

So I realized what I said about the EDm being produced by uniform
log-of-odds updating was correct (probably should have been pretty obvious
to me, but I'll blame distractions).

What I'm talking about is this:

Since in the program P and P2 are exhaustive probabilities covering the two
possibilities, and P only changes uniformly (since all evidence is assumed
to be of one type) using Lmb, and P2 only changes using the opposite
evidence likelihood ratio (1/Lmb), then P2 is simply the opposite
probability (1-P).

Therefore

log(P2/P) = log( (1-P) / P ) = inverse log of odds

This equivalence can be seen between x2 and x3 in the program below.

The point is that the EDm scale represents odds, while the others represent
probabilities. (Possible extensions might be a scale based on some odds
ratio R, or maybe using hypotheses that are not exhaustive).

___________

'BAYESTU4.BAS

'Produces N-EDm
CONST N = 13
CONST m = 3
CONST offset = 0 'Set to 1200 to begin values at 0 cents

DEFDBL A-Z
OPEN "out.txt" FOR OUTPUT AS #1
CLS
PRINT " (With P in cents)": PRINT : PRINT " P<Lmb>
P<1/Lmb> 1-P<Lmb> P<1/Lmb>-P<Lmb>"
pinit = 1 / 2: p2init = 1 - pinit
lmb = (m * pinit / (1 - pinit)) ^ (1 / N)
FOR i = 0 TO N
p = 1 / (1 + lmb ^ i * (1 - pinit) / pinit)
p2 = 1 / (1 + (1 / lmb) ^ i * (1 - p2init) / p2init)
x = 1200 * LOG(p) / LOG(2) + offset
x2 = 1200 * LOG(p2) / LOG(2) + offset
x3 = 1200 * LOG(1 - p) / LOG(2) + offset
PRINT USING "########.####"; x; x2; x3; x2 - x
PRINT #1, USING "########.####"; x; x2; x3; x2 - x
NEXT i
CLOSE #1

🔗Daniel Nielsen <nielsed@uah.edu>

6/1/2011 11:04:03 PM

I noticed that the previously given Bayes tuning is very close to Shaahin
Mohajeri's ADO, differing by less than 3 cents at max, when the initial ADO
constant (what is called A1 at
http://sites.google.com/site/240edo/arithmeticrationaldivisionsofoctave) is
set to the first iteration value of the Bayes sequence.

n divisions of m/1

(In cents)

m=2; n=12

Bayes ADO Difference

0 0 0.0000000
81.06122 81.06122 0.0000000
165.7415 165.5659 0.1756287
254.0175 253.5139 0.5036011
345.8528 344.9054 0.9473877
441.1953 439.7402 1.4551086
539.9812 538.0186 1.9626465
642.1331 639.7402 2.3928833
747.5635 744.9053 2.6582031
856.1749 853.5139 2.6609497
967.8609 965.5659 2.2950439
1082.509 1081.061 1.4475098
1200 1200 0.0000000

For m>2, difference increases very ostensibly (but in a predictable
fashion). Here is m=3 (n=12):

0 0 0.0000000
119.9863 119.9863 0.0000000
247.7172 246.9744 0.7428284
383.0897 380.9643 2.1253662
525.9362 521.9561 3.9801025
676.0291 669.9496 6.0795288
833.0903 824.9449 8.1453247
996.7983 986.9421 9.8562012
1166.798 1155.941 10.8568115
1342.71 1331.942 10.7680664
1524.14 1514.944 9.1958008
1710.689 1704.949 5.7398682
1901.955 1901.955 0.0001221

Sorry if the email server reformats that all screwy.

Dan N

_______________

'ARIBAYES.BAS

DECLARE FUNCTION lb! (x!)
DEFSNG A-Z

CLS

m = 2 'Period interval size
n = 12 'Number of divisions
p = 1 / 2 ' Initial probability for Bayes

FOR k = 0 TO n

'Bayes
x = 1200 * lb(1 + ((2 * m - 1) * p / (1 - p)) ^ (k / n) * (1 - p) / p) -
1200
PRINT x,

'Arithmetic sum
IF k = 1 THEN a = x
IF k > 0 THEN y = k * (a + (k - 1) * (1200 * lb(m) - n * a) / (n * (n - 1)))
ELSE y = 0
PRINT y,

PRINT USING "####.#######"; x - y

NEXT

FUNCTION lb (x)
lb = LOG(x) / LOG(2)
END FUNCTION

🔗Daniel Nielsen <nielsed@uah.edu>

6/1/2011 11:09:39 PM

On Thu, Jun 2, 2011 at 1:04 AM, Daniel Nielsen <nielsed@uah.edu> wrote:

> I noticed that the previously given Bayes tuning is very close to Shaahin
> Mohajeri's ADO, differing by less than 3 cents at max, when the initial ADO
> constant (what is called A1 at
> http://sites.google.com/site/240edo/arithmeticrationaldivisionsofoctave)
> is set to the first iteration value of the Bayes sequence.
>

Oops! - meant to link to
http://sites.google.com/site/240edo/arithmeticirrationaldivisions(aid)

🔗Daniel Nielsen <nielsed@uah.edu>

6/2/2011 7:00:10 AM

Some .SCL files of the Bayes stuff is at DanNielsen/bayes.zip

It was generated with..

DECLARE FUNCTION lb! (x!)
DEFDBL A-Z

CLS

DIM div(5)

div(0) = 12
div(1) = 15
div(2) = 17
div(3) = 19
div(4) = 22

FOR i = 0 TO 4

m = 2
n = div(i)
p = 1 / 2

file$ = "bayes" + RIGHT$(STR$(n), 2) + ".scl"

OPEN file$ FOR OUTPUT AS #1

PRINT #1, "! bayes";
PRINT #1, USING "##"; n;
PRINT #1, ".scl"
PRINT #1, "!"
PRINT #1, "Iterative binary Bayes (uniform data, initial prior=.5)"
PRINT #1, n
PRINT #1, "!"

FOR k = 1 TO n - 1

'Bayes
x = 1200 * lb(1 + ((2 * m - 1) * p / (1 - p)) ^ (k / n) * (1 - p) / p) -
1200
PRINT #1, USING "#####.#############"; x

NEXT k

PRINT #1, " ";
PRINT #1, USING "#"; m;
PRINT #1, "/1"

CLOSE #1

NEXT i

DEFSNG A-Z
FUNCTION lb (x)
lb = LOG(x) / LOG(2)
END FUNCTION

🔗Daniel Nielsen <nielsed@uah.edu>

6/2/2011 6:32:24 PM

This time Bayes was compared to both ADO (as before) and ADL. ADO was a
closer fit, as expected, since it's so friggin' close to begin with:

(in cents; 12 divisions of 2/1)

BAYES ADO Diff ADL Diff
0.0000 0.0000 0.0000 0.0000 0.0000
81.0613 81.0613 0.0000 81.0613 -0.0000
165.7414 165.5660 0.1754 164.6934 1.0479
254.0176 253.5141 0.5035 251.0874 2.9302
345.8528 344.9056 0.9472 340.4574 5.3954
441.1955 439.7405 1.4549 433.0443 8.1511
539.9812 538.0188 1.9624 529.1211 10.8601
642.1331 639.7405 2.3926 628.9982 13.1350
747.5635 744.9056 2.6579 733.0311 14.5324
856.1748 853.5141 2.6607 841.6295 14.5452
967.8609 965.5660 2.2949 955.2692 12.5916
1082.5088 1081.0613 1.4475 1074.5069 8.0019
1200.0000 1200.0000 0.0000 1200.0000 0.0000

Produced with..
_____________

'ARIBAYES.BAS

DECLARE FUNCTION lb# (x#)
DEFDBL A-Z

CLS

m = 2
n = 12
p = .5

FOR k = 0 TO n

'Bayes
x = 1200 * lb(1 + ((2 * m - 1) * p / (1 - p)) ^ (k / n) * (1 - p) / p) -
1200
PRINT USING "######.####"; x;

'ADm
IF k = 1 THEN a = x
IF k > 0 THEN y = k * (a + (k - 1) * (1200 * lb(m) - n * a) / (n * (n - 1)))
ELSE y = 0
PRINT USING "######.####"; y;

PRINT USING "######.####"; x - y;

'ADL
IF k = 1 THEN b = 1 - 2 ^ (-x / 1200)
IF k > 0 THEN z = 1200 * lb(1 / (1 - (k * (b + (k - 1) * ((1 - 1 / m) - n *
b) / (n * (n - 1)))))) ELSE z = 0
PRINT USING "######.####"; z;

PRINT USING "######.####"; x - z

IF k MOD 24 = 23 THEN SLEEP

NEXT

FUNCTION lb (x)
lb = LOG(x) / LOG(2)
END FUNCTION

🔗Mike Battaglia <battaglia01@gmail.com>

6/2/2011 8:54:35 PM

This is way over my head. Could you explain what you're doing in more
layman's terms? I don't know anything about Bayesian inference or
Binary Bayes or Weber's law.

In general, you will find that this is a very new, experimental, and
interdisciplinary community - things you might assume that everyone
knows are often things that only you know, and hence can contribute to
the theory. So you might have to explain things from the start
sometimes, or reference us to some further reading on it.

It seems really interesting though, and since you're claiming it has
some kind of relationship to the Father example I posted, it looks
worth getting into.

-Mike

On Thu, Jun 2, 2011 at 9:32 PM, Daniel Nielsen <nielsed@uah.edu> wrote:
>
> This time Bayes was compared to both ADO (as before) and ADL. ADO was a closer fit, as expected, since it's so friggin' close to begin with:
> (in cents; 12 divisions of 2/1)
> BAYES        ADO        Diff           ADL         Diff
>      0.0000     0.0000     0.0000     0.0000     0.0000
>     81.0613    81.0613     0.0000    81.0613    -0.0000
>    165.7414   165.5660     0.1754   164.6934     1.0479
>    254.0176   253.5141     0.5035   251.0874     2.9302
>    345.8528   344.9056     0.9472   340.4574     5.3954
>    441.1955   439.7405     1.4549   433.0443     8.1511
>    539.9812   538.0188     1.9624   529.1211    10.8601
>    642.1331   639.7405     2.3926   628.9982    13.1350
>    747.5635   744.9056     2.6579   733.0311    14.5324
>    856.1748   853.5141     2.6607   841.6295    14.5452
>    967.8609   965.5660     2.2949   955.2692    12.5916
>   1082.5088  1081.0613     1.4475  1074.5069     8.0019
>   1200.0000  1200.0000     0.0000  1200.0000     0.0000

🔗Daniel Nielsen <nielsed@uah.edu>

6/3/2011 12:11:40 AM

On Thu, Jun 2, 2011 at 10:54 PM, Mike Battaglia <battaglia01@gmail.com>wrote:
>
> This is way over my head. Could you explain what you're doing in more
> layman's terms? I don't know anything about Bayesian inference or
> Binary Bayes or Weber's law.
>
> In general, you will find that this is a very new, experimental, and
> interdisciplinary community - things you might assume that everyone
> knows are often things that only you know, and hence can contribute to
> the theory. So you might have to explain things from the start
> sometimes, or reference us to some further reading on it.
>
I did link to a chapter (Elementary Hypothesis Testing, Probability Theory:
The Logic of Science, http://www-biba.inrialpes.fr/Jaynes/cc04q.pdf) in ET
Jaynes' posthumously published book, but I can understand if you don't want
to go through those pages. I haven't read it in a good while, so take
everything I say with a grain of salt. The statistical devil is often in the
details.

Lemme try to put it in a nutshell.

We have 2 hypotheses in mind concerning the state of an urn full of balls,
where each ball is known to be either white or black, as well as some
knowledge about how we deal with this urn. The hypotheses are statements
that are mutually exclusive and exhaustive; i.e., they cover all possible
conclusions without overlapping. For instance, we might know that we woke up
in either Whitetown or Blackville (please forgive unintentional racial
otones! :/), but not which. If we know that urns in Whitetown are 80% white,
and urns in Blackville are 90% black, then the two hypotheses are
Hyp. A: p(white) = 80%
Hyp. B = not A: p(white) = 10%

Let's say that every time we draw a ball we replace it and shake
("randomize") the urn again. This is called sampling with replacement, and
it leads to much easier likelihood calculations, since each sampling is now
"logically independent" of every other.

FOOTNOTE PARAGRAPH:
(The sampling distribution that results from this assumption is the binomial
one. In general, a sampling distribution just describes how many ways there
are of arriving at our current state as compared to the other possible
states; for instance, drawing 100 balls from an urn with 50% probability of
giving white, it is less likely that we would draw 100 white balls than 50
white and 50 black, because there are more "paths" that arrive at 50/100
than 100/100. If we were sampling without replacement, the hypergeometric
distribution would result, unless the urn contains so many balls we are only
"taking a drop from the bucket". The sampling distribution does not matter
for our purposes here, but I mention it because it could be useful in
generating sequences of tones. Here's something completely unrelated that
shows this idea of binomial numbers representing paths to a state:
http://reocities.com/Vienna/9349/combinatorics.html#pascal)

So now we use the Bayes formula to update our conceived "probability" of
drawing white every time we draw. If we draw white a lot, we will tend to
think that p(white) is high, and consequently will tend toward hypothesis A.
If we draw black a lot, we will judge p(white) as low, and tend toward
hypothesis B.

At least, that would be one way to do it, but it is not the best
computationally.

Instead, we can use the log-of-odds=log(p/(1-p)), and add the
log-of-likelihood-ratio every time we draw (NOTE: here I'm using the typical
definition of likelihood ratio, not the reciprocal used in previous posts).
Since each draw is independent, the log-of-likelihood-ratio will take one of
2 constant values each draw, depending on whether white or black was drawn.
Let's say these 2 values are L and s. (No, this is not a MOS, but why not
co-opt the symbols, since one of them is likely to be smaller than the
other?)

L = log-likelihood-ratio of white = lb [ p(white | A) / p(white | B) ] = lb
.8 - lb .1 = 3
s = log-likelihood-ratio of black = lb [ p(black | A) / p(black | B) ] = lb
.2 - lb .9 = -2.17

We could compute all our samples in one fatally simple blow instead by

log-of-odds = prior-log-of-odds + M*L + N*s

where M and N are the number of white samples and black samples
respectively.

WHY THIS IS IMPORTANT: Jaynes argues that this is how perception is done -
that, for better or worse, we are hardwired for rapid binary Bayesian
judgement-making. That is why we see the logarithmic relation so often, as
in pitch perception - and it is related to the Weber and Shannon laws of
human information processing.

OKAY, WHETHER OR NOT YOU BUY THAT, HOW IS THE BAYES TUNING CONSTRUCTED?

The Bayes tuning assumes that all the balls drawn from the urn are the same
color; this gives us a nice sweep in one direction with which to define a
tuning. One might guess that humans are used to processing such streams of
uniform data results.

However, it would not be very interesting simply to use the formula

cents = 1200 * log-of-odds = 1200 * (prior-log-of-odds + k*L) for k = 0..M

All that gives us is ED of some equivalence interval. By adjusting L, we can
define what that equivalence interval would be, but what's the point of all
this, if we only wind up at EDm?

Let us consider instead defining cents by log-of-probability, another useful
value.
By definition,

prob = 1 / (1 + 1 / (likelihood-ratio^k * initial-prior-odds))
..so..
cents = 1200 * log-of-prob = -1200 * lb(1 + 1 / (likelihood-ratio^k *
initial-prior-odds))

Now that's what I'm talking about :) The initial-prior-probability chosen
was 1/2, a very sensible value, and the target value was set an octave away
at 1/4. Since 1/2 to 1/4 is a range from -1200 to -2400 cents, the formula
was ALTERED slightly to

cents = 1200 * [ lb(1 + 1 / (likelihood-ratio^k * initial-prior-odds)) - 1 ]

Okay, so that's the derivation.

What happens if we instead used the reciprocal of the likelihood ratio? What
would that represent? Well, for instance, instead of
L = lb[ p(white | A) / p(white | B) ],
we would have
L = lb[ p(white | B) / p(white | A) ],
which would be exactly the likelihood ratio of white we would have assigned
had Whitetown and Blackville traded places so that we were more likely to
draw white in Blackville and vice-versa.

What is the point? I dunno, for some reason it seemed like a very
straightforward modification, and it especially caught my interest since
this reciprocal-likelihood-ratio version gave a range of 3/2 when the
original's range was set to 2/1. Also, what it produces is a tuning that,
when subtracted from the other tuning cent value by cent value, gives the
EDm that resulted in the log-of-odds expression. I tried to explain why in
another post, but I don't know how much sense it made (even to myself).

I do believe it's a valuable decomposition of EDm, and the fact that the
Bayes tuning over 2/1 is almost exactly the same as Shahiin's ADO seems to
indicate that it may be even more significant as regards typical sensory
progressions.

🔗genewardsmith <genewardsmith@sbcglobal.net>

6/4/2011 11:42:12 AM

--- In tuning-math@yahoogroups.com, Daniel Nielsen <nielsed@...> wrote:
>
> Some .SCL files of the Bayes stuff is at DanNielsen/bayes.zip

I haven't been able to make much of these as tuning systems. Maybe 5 or 7 notes would be good to look at as a scale, if you want to add those.

🔗Daniel Nielsen <nielsed@uah.edu>

6/4/2011 2:27:12 PM

Thanks, Gene, here is the new file:
/tuning-math/files/DanNielsen/

________

On Sat, Jun 4, 2011 at 1:42 PM, genewardsmith
<genewardsmith@sbcglobal.net>wrote:

>
>
>
> --- In tuning-math@yahoogroups.com, Daniel Nielsen <nielsed@...> wrote:
> >
> > Some .SCL files of the Bayes stuff is at DanNielsen/bayes.zip
>
> I haven't been able to make much of these as tuning systems. Maybe 5 or 7
> notes would be good to look at as a scale, if you want to add those.
>

🔗Daniel Nielsen <nielsed@uah.edu>

6/4/2011 2:40:17 PM

>
> On Sat, Jun 4, 2011 at 1:42 PM, genewardsmith <genewardsmith@sbcglobal.net
> > wrote:
>
I haven't been able to make much of these as tuning systems. Maybe 5 or 7
>> notes would be good to look at as a scale, if you want to add those.
>>
>>

Funny, something that stands out right away due to the round number of
cents: From a 12-tET POV, 5-tBayes and 7-tBayes seem potentially interesting
(or related?), since they respectively hit 200. and 1000. almost dead on.

🔗Daniel Nielsen <nielsed@uah.edu>

6/5/2011 1:23:42 PM

Does anything look interesting with the following scale?

It uses a slightly different approach:

* Consider 3/2 to be the beginning position

* Use the typical Bayesian construction given previously with N=7 divisions
and a range of m=3/2. This fills in 7 notes from 3/2 to 1/1. (We had
previously inverted results so that they ascended, but these pitches
descend, so don't require the artificial inversion, but still use an offset
to center at 3/2.)

* Take the reciprocal of the likelihood ratio and use N=5. These notes
naturally fill in the space from 3/2 to 2/1 with 5 notes.

* Taken together, we have a 12 note scale over 2/1 (of course, that isn't a
requirement):

! bayes_alt12.scl
!
Alternate Bayesian construction
12
!
112.3794
220.8697
325.3674
425.7850
522.0527
614.1197
3/2
817.7994
925.3725
1024.7917
1116.2483
2/1