back to list

Tuning accuracy - Digital synthesis engines

🔗J. Scott <cgscott@xxxxxxx.xxxx>

5/16/1999 12:14:49 AM

Howdy all,

There have been periodic comments about the tuning resolution of
digital hardware.

I was an engineer on three digital synthesis engines,
each of comparable complexity (in terms of transistor
count) to the 486. In addition to a little IC design and
verification, I also wrote (with another guy) hardware drivers
for the chips, designed and laid out a couple of multi-channel,
low noise (100dB S/N) soundcards, wrote several Win95 applications
using the chips, implemented a scriptable, programmable,
multichannel effects system, and ported one of my interactive,
real-time, 3-D positional audio algorithms to work on the chips
(supporting up to 128 simultaneous 3D streams - in 1996!)
Oh and I also threw in a dynamic stereo widening '3D' effect
for good measure and wrote and illustrated all the manuals.

So I know a little bit about digital synthesis.

SIDEBAR
--------------------------------------------------------------
| The main reason I got involved in engineering was because I
| needed more control, in terms of being able to design my
| own instruments, as a composer. Quick summary: particular
| fields of interest are spatialization of sound and nonoctave
| tunings; I'm a get-your-hands-dirty,
| see-what-sounds-good-to-me composer, but not a
| castles-in-the-sky numerologist.
--------------------------------------------------------------

These comments describe tuning on a typical digital synth. They apply
to wavetable based instruments. FM instruments and physically modelled
instruments and analog synths with digital oscillators all use
wavetables at some point to generate waveforms.

There are some simplifications and generalizations but you should get
the general idea.

------------------------------------------------------------------------------------------------------------
Notes: * This post contains a wide table which may present
problews in viewing, depending on your email software.
Turn off word-wrap, widen the window, or save and
view in a text editor if you have problems. Be sure
to use a monospaced font to see it lined up properly.

* There are two common audio uses of the noun "sample".
I distinguish between them by using the following terms:
1. I use the term "wave recording" to mean what is
often called "sample". Example: "A wave recording
of a saxophone."
2. I use the term "sample" to mean a single number
within a wave recording. Example: "Sample number
1,983,780 in this AIFF file was clipped."
------------------------------------------------------------------------------------------------------------

There are two places where tuning is a factor on a digital synth:

1. The tuning tables. Usually in log-of-frequency format with
a resolution of cents, 1024ths of an octave, or 64ths of a
semitone. This is what you have access to from the front
panel if the tuning is editable. But even if not editable,
a tuning table still exists. The code that handles this is
in the instrument's firmware in the case of standalone
instruments and some soundcards, or software driver in the
case of other soundcards.

These numbers are mixed with the patch settings and the details
of the wavetables being used in order to generate a number
that is given to the synthesis hardware to read the wavetables
and generate the frequency you want.

2. The digital synthesis hardware. This is where the actual
frequency resolution is determined. It is not the same as
the resolution which is available through the tuning tables.
The digital synthesis engine is usually either an ASIC
(Application Specific Integrated Circuit) or a general
purpose DSP chip. The chip that produces the digital waveforms
doesn't use cents when addressing the waveform samples. It
specifies tuning through the use of a phase increment
register which I'll call "dPhase". dPhase is typically
a 24-bit fixed point number, 16 bits of which are the
fraction. Thus dPhase ranges from 0.000000000000 to
255.999984741211 in 0.000015258789 step increments.
Every sample period, dPhase is added to the wave pointer -
the pointer to the wavetable where the samples are. Wrap
around occurs at some point where the end of the wavetable
is in the sample RAM or ROM. So if dPhase is 1.0 then
every sample is read in the table and the output pitch
is equal to the pitch at which the wave recording was
originally recorded. If dPhase is 2.0, then every other
sample is skipped and so the pitch is an exact just
octave above the original pitch. If dPhase is 1.5,
then the pitch is a exact just fifth above the original
pitch. And so forth. When the sample address is fractional,
then the synthesis engine will usually interpolate
between the samples on either side according to
some interpolation scheme.

Example:

The output sampling rate is 48000 kHz. We are playing
a sine wave that is in a 256-sample-long wavetable in
the sample ROM starting at address 1024. We want to play
at a frequency of 600 Hz.

dPhase = CycleLengthInSamples * FrequencyInHz / SampleRateInHz
= 256 * 600 / 48000
= 3.2

So to generate the output samples we would start by getting a sample
at location 1024. After 1/48000 of a second we would advance the
wave pointer to 1024 + 3.2 = 1027.2. We'd grab samples from locations
1024 and 1025 and output 0.2 times the value at location 1024 plus
0.8 times the value at location 1025 (assuming linear interpolation).
And 1/48000 of a second later we'd do it again, advancing to 1030.4 ...
until we go past the end of the table at location 1024+256 = 1280,
where we would subtract 256 to get back towards the beginning of
the wave table.

------------------------------------------------------------------------------------------------------------
Next I will present tables showing the native tuning resolution
of a typical synthesis engine. You can sometimes bypass the
tuning tables and access this directly if you write your own
soundcard driver.

Note in particular that the pitch accuracy you get is not fixed -
it depends on the the frequency you are shooting for, which is
related to the cycle length in samples of the wavetable you are
using.

------------------------------------------------------------------------------------------------------------

Column Headings
---------------

CycleLength = for some particular wave recording, the
number of samples per cycle at the fundamental
pitch of the wave recording

dPhase = internal representation of pitch to the
hardware - how many samples to skip per
sample period. Sometimes called the "Phase
Increment".

Freq. A (Hz) = SampleRate/CycleLength * dPhase
For the given TableLength and dPhase, the resulting pitch.

epsilon is the smallest amount by which dPhase can be changed, in this
case 1/65536 since there are 16 fractional bits.

Freq. B (Hz) = SampleRate/CycleLength * dPhase
The next higher frequency above Freq. A at which
this wave recording can be played.

Freq. B/Freq. A
= Smallest Pitch difference that can be played for this
CycleLength and dPhase, as a ratio.

cents resolution
= 100*log(Freq. B/Freq. A)/log(2^(1/12))
Smallest Pitch difference that can be played for this
CycleLength and dPhase, in cents.
------------------------------------------------------------------------------------------------------------

SampleRate = 44100 Hz

CycleLength dPhase Freq. A (Hz) dPhase+epsilon Freq. B (Hz)
Freq. B/Freq. A cents resolution
----------- ------ -------------- -------------- ----------------
--------------- ----------------
128 0.25 86.1328125 0.250015258789 86.13806962967
1.000061035156 0.105662916146
128 0.5 172.265625 0.500015258789 172.2708821297
1.000030517578 0.052832264192
128 1 344.53125 1.000015258789 344.5365071297
1.000015258789 0.026416333632
128 2 689.0625 2.000015258789 689.0677571297
1.000007629395 0.013208217201
128 4 1378.125 4.000015258789 1378.13025713
1.000003814697 0.006604121196
128 8 2756.25 8.000015258789 2756.25525713
1.000001907349 0.003302063747
256 0.25 43.06640625 0.250015258789 43.06903481483
1.000061035156 0.105662916146
256 0.5 86.1328125 0.500015258789 86.13544106483
1.000030517578 0.052832264192
256 1 172.265625 1.000015258789 172.2682535648
1.000015258789 0.026416333632
256 2 344.53125 2.000015258789 344.5338785648
1.000007629395 0.013208217201
256 4 689.0625 4.000015258789 689.0651285648
1.000003814697 0.006604121196
256 8 1378.125 8.000015258789 1378.127628565
1.000001907349 0.003302063747
512 0.25 21.533203125 0.250015258789 21.53451740742
1.000061035156 0.105662916146
512 0.5 43.06640625 0.500015258789 43.06772053242
1.000030517578 0.052832264192
512 1 86.1328125 1.000015258789 86.13412678242
1.000015258789 0.026416333632
512 2 172.265625 2.000015258789 172.2669392824
1.000007629395 0.013208217201
512 4 344.53125 4.000015258789 344.5325642824
1.000003814697 0.006604121196
512 8 689.0625 8.000015258789 689.0638142824
1.000001907349 0.003302063747
1024 0.25 10.7666015625 0.250015258789 10.76725870371
1.000061035156 0.105662916146
1024 0.5 21.533203125 0.500015258789 21.53386026621
1.000030517578 0.052832264192
1024 1 43.06640625 1.000015258789 43.06706339121
1.000015258789 0.026416333632
1024 2 86.1328125 2.000015258789 86.13346964121
1.000007629395 0.013208217201
1024 4 172.265625 4.000015258789 172.2662821412
1.000003814697 0.006604121196
1024 8 344.53125 8.000015258789 344.5319071412
1.000001907349 0.003302063747
2048 0.25 5.38330078125 0.250015258789 5.383629351854
1.000061035156 0.105662916146
2048 0.5 10.7666015625 0.500015258789 10.7669301331
1.000030517578 0.052832264192
2048 1 21.533203125 1.000015258789 21.5335316956
1.000015258789 0.026416333632
2048 2 43.06640625 2.000015258789 43.0667348206
1.000007629395 0.013208217201
2048 4 86.1328125 4.000015258789 86.1331410706
1.000003814697 0.006604121196
2048 8 172.265625 8.000015258789 172.2659535706
1.000001907349 0.003302063747
4096 0.25 2.691650390625 0.250015258789 2.691814675927
1.000061035156 0.105662916146
4096 0.5 5.38330078125 0.500015258789 5.383465066552
1.000030517578 0.052832264192
4096 1 10.7666015625 1.000015258789 10.7667658478
1.000015258789 0.026416333632
4096 2 21.533203125 2.000015258789 21.5333674103
1.000007629395 0.013208217201
4096 4 43.06640625 4.000015258789 43.0665705353
1.000003814697 0.006604121196
4096 8 86.1328125 8.000015258789 86.1329767853
1.000001907349 0.003302063747
------------------------------------------------------------------------------------------------------------
The main points:

* Lower frequencies have less pitch resolution.

* The pitch resolution for a given fundamental frequency depends
on the wave table being used - in particular how many samples
are used to store one cycle at the fundamental pitch.

By the way, csound has better pitch resolution because you typically
use (I think) 32 bit floating point numbers for your tuning tables
AND for your phase increment and wave pointers/table indices. But
the general idea is the same.

- Jeff