back to list

Diatonic hearing modeled as a feature space

🔗Mike Battaglia <battaglia01@...>

2/4/2011 4:57:45 AM

I think that there is more to "diatonic" hearing than 5L2s or the
positions of the pitches in 12-tet.

I think that diatonic hearing is made up of a collection of different,
random psychoacoustic phenomena that we have adapted to prioritize,
search for, and extricate from the incoming signal. An analogous
process of adaptations takes place when we process language: when you
learn another language, you have to start listening for different
inflections that you equated together in your old language, or else
you'll have a terrible accent and not be able to understand what
anyone's saying. Hence you move towards an internal coding of the
signal that gets closer and closer to the actual entropy of the signal
itself. And I mean entropy from an information theory standpoint, not
just harmonic entropy. Whether language adaptations and music hearing
adaptations are actually the same thing is neither here nor there, but
I wouldn't be surprised if so.

This process has to do with pattern recognition. In a sense, we're
taking the signal and mapping it into a feature space. Feature spaces
are currently a hot topic in DSP, because people are all about them
for building music search engines and auto-categorizing music and
stuff. If you don't know what a feature space is, here's a good place
to start: http://en.wikipedia.org/wiki/Features_%28pattern_recognition%29

Probably better: http://en.wikipedia.org/wiki/Feature_extraction

Either way, certain other tunings will probably create lots of
activity in different subspaces of a western listener's feature space
right off the bat. This is to say that different subsets of the
diatonic feature vectors, ones that we've become ultra-sensitive to,
might still jump out at you from other tunings. Pajara, for example,
tends to excite similar features as does the diatonic scale. In
Graham's Hardy piece in miracle, you might also hear some diatonic
stuff. Blackwood excites the diatonic feature space quite a bit. Etc.

Now, some diatonic features activate when you listen to porcupine[7],
but sometimes there's noise too - you sometimes can't tell if a note
is a "major third" or a "minor third," expressed in whatever those
terms personally mean to you. Sooner or later you'll have to adapt to
learn that the SNR for this feature, applied this tuning, is just a
bit too low. So you'll have to either ditch the concept entirely, or
improve your algorithm for figuring it out. Either way, once the
problem is resolved, porcupine should suddenly become a lot more
"intelligible" to you.

So there are two interesting things that could happen from this perspective:

The first is when a feature that correlated well to the input signal
for a diatonic scale fails entirely for some other scale, e.g.
porcupine. This interferes with the intelligibility of the signal, and
the signal will just sound ambiguous and weird to you; further
adaptation is necessary. Neutral thirds and major thirds are similar
enough in size to group them together in some cases, but they're also
different enough in size to distinguish them in different cases,
depending on how your personalized feature extraction algorithm is
running.

The second is that sometimes a feature that maps well to a diatonic
scale ALSO maps well to another scale. Sometimes people tend to think
that this is because they have been "tainted" with the curse of a
lifetime of diatonic hearing, but in this perspective these things are
still valid features of the signal, and meantone doesn't really have a
claim on them. The feature of a "leading tone resolving," for
instance, can apply whether or not you're in meantone at all.

This is a very sketchy and preliminary analysis, and to be honest
feature extraction is something I know more about rather than
understanding in depth. But I think this paradigm might be useful for
analyzing what's going on, because all of our disagreements in this
regard can be represented as concrete statements about the feature
space. If you believe that the only way we can adapt, after a lifetime
of diatonic hearing, to understand porcupine temperament is to diverge
from "diatonic" hearing entirely - then you have formed two isolated
feature spaces and you believe that this is the only possible setup
that could exist. If, you on the other hand, feel that it's possible
to adapt in such a way that diatonic and porcupine hearing merge into
one generalized structure of hearing, then you have now set up a
higher-dimensional feature space that the other two form a subset of.

If you believe that all hearing is related to JI, and you now enjoy an
ultra-fine ability to discriminate between similar intervals, and you
hate something like 12-tet because it equates 81/64 and 5/4, then you
have somehow set up a feature space in which 81/64 and 5/4 map to
completely different features (which I don't think is possible, but
that's just me), and you do not enjoy the loss of dimensionality as
you start equating things again. You also feel that this is the only
positive adaptation possible; e.g. that developing a feature space
which could extract information out of deliberately tempered intervals
is in some sense maladaptative and "lossy."

If you believe that all of music derives from JI, then you believe
that a lot of the features are fixed from birth and correspond to JI
intervals (or at least the ones we can discern). If you believe that
all of music derives from general psychoacoustics, and the HE is a
more apt model, then you believe that the features are still fixed
from birth, but correspond to a handful of JI intervals and an
additional "mistuning" feature. If you believe that the latter is
involved, but that the JI intervals tend to be clustered into
different perceptual groups (like how the diatonic scale equates 7/6
and 6/5 in terms of functionality), then you believe that a lot of the
features are still fixed from birth, correspond to a handful of JI
intervals and an additional "mistuning" feature, and that the feature
vectors are dynamic and changing - different ones can be prioritized,
different ones can be equated and differentiated, etc.

If you believe that all of music derives from cognitive things outside
of psychoacoustics, sort of an extremist-Rothenberg perspective, then
you believe that the features have more to do with how different notes
match up to one another in size. Rothenberg's specific claim that an
improper scale is useless for melody is equivalent to the claim that
interval-width ordering cues are such a fundamental and predominant
feature of the signal that it is extremely difficult to set up a
feature space without it. Rothenberg's claim that subsets of improper
scales can work for melody is equivalent to the claim that it is
possible to play subsets of the scale such that the SNR for that
feature is still set up to be low Paul's claim that he thinks that
improper scales are fine for melody is equivalent to the claim that he
has set up a feature space that has adapted to handle the sometimes
low SNR for this vector.

The claim that MOS's are easier to grasp is equivalent to the claim
that these scales are easier to perform feature extraction on, in
general, due to some predominant feature that we all share. Whether or
not this is inborn has to do with the mutability of the vectors.

The claim that atonal people can hear "inversions" of chords as being
equivalent to the actual chords themselves is equal to the claim that
they can attenuate periodicity feature extraction, and have built an
additional "symmetry" feature. The claim that they're full of it and
just being pretentious is equal to the claim that this feature space
is somehow biologically unattainable.

The following are some possible things that could be extracted as
"features" from a signal
- That a sequence of notes fleshes out a certain background structure
(e.g. that they're part of a scale)
- That these scales repeat at the octave (or something else)
- That a certain chord/structure has high or low tonalness
- That a certain structure has high or low roughness
- That a certain structure tends to be followed by another structure
- That a certain structure is an "upside-down" version of another structure
- That the roots often move by an approximate 3/2
- That melodies tend to move in terms of whole and half steps in an
approximate 2:1 ratio
- That motion by half step tends to be used to move from a
high-entropy chord to a low-entropy chord
- That a certain structure makes you think of a certain color
- That different structures make you think of different colors

Here are some features that we haven't seen activated much in music:
- That different structures resemble different phonemes
- That structures beating at certain rates are followed by structures
beating at other rates
- That detuned timbres are followed by perfectly tuned timbres

etc. Perhaps some people here might have more insight into this.

-MIke