back to list

One rather off-topic contribution - The Unicode Character Database

🔗Petr Pařízek <p.parizek@chello.cz>

4/1/2007 1:42:17 AM

Hi there.

Some time ago, one of you mentioned the possibility of adding new symbols
like Sagittal to the Unicode chart. A few months later, I was browsing the
Unicode server and I was surprised how much the used characters were
scattered around the space, which made me think this might be a task of
touching more than one block (or group, if you wish) of characters. You can
find more here: www.unicode.org/Public/zipped/5.0.0/UCD.zip

Petr

🔗Danny Wier <dawiertx@sbcglobal.net>

4/1/2007 8:00:00 AM

----- Original Message ----- From: "Petr Pa��zek" <p.parizek@chello.cz>
To: "Tuning List" <tuning@yahoogroups.com>
Sent: Sunday, April 01, 2007 3:42 AM
Subject: [tuning] One rather off-topic contribution - The Unicode Character Database

> Hi there.
>
> Some time ago, one of you mentioned the possibility of adding new symbols
> like Sagittal to the Unicode chart. A few months later, I was browsing the
> Unicode server and I was surprised how much the used characters were
> scattered around the space, which made me think this might be a task of
> touching more than one block (or group, if you wish) of characters. You > can
> find more here: www.unicode.org/Public/zipped/5.0.0/UCD.zip

It would use a block of its own, I'd imagine. A large number of extensions added at once tends to get its own space somewhere in there, and it might not be continuous with the older block. If a few are added, they're added to the end, some some blocks end up being really piecemeal.

But getting Sagittal encoded in Unicode would be a hard sell right now, wouldn't it? It's going to have to become a widely-used notation before they'll consider including it. They don't even have Tartini-Couper or Arel-Ezgi accidentals encoded. They do have sharps, flats and accidentals with arrows attached, and sharps and flats with a 4-like attachment--but does anyone use these?

Unicode has ancient Greek musical notation, however. They're much more open to including ancient scripts.

~D.

🔗Herman Miller <hmiller@IO.COM>

4/1/2007 4:38:52 PM

Danny Wier wrote:

> It would use a block of its own, I'd imagine. A large number of extensions > added at once tends to get its own space somewhere in there, and it might > not be continuous with the older block. If a few are added, they're added to > the end, some some blocks end up being really piecemeal.
> > But getting Sagittal encoded in Unicode would be a hard sell right now, > wouldn't it? It's going to have to become a widely-used notation before > they'll consider including it. They don't even have Tartini-Couper or > Arel-Ezgi accidentals encoded. They do have sharps, flats and accidentals > with arrows attached, and sharps and flats with a 4-like attachment--but > does anyone use these?

It's strange that they picked such an obscure symbol for "quarter tone sharp" and "quarter tone flat", but a particular font could use a more familiar symbol for displaying those characters. I see they only have a single character for the quarter note rest, which is shown as the zigzag version in the chart, but there's another quarter note rest that's still occasionally seen (which looks like an eighth rest combined with a 180-degree rotated eighth rest, or roughly the letter Z), but it doesn't have a Unicode character assigned to it. So they might be considering all the different quarter tone symbols as glyph variants representing the same abstract "character".

Now, one interesting thing is that they have ornamentation characters that combine to form the different Baroque ornaments (1D19B-1D1A5). A similar thing could be done for Sagittal by having characters for the individual flags (up and down versions for each), which could be combined to form a single Sagittal symbol. New symbols from existing flags could be created without changing the Unicode definition. But a drawback of that is that the font rendering engine would have to support ligatures in the Musical Symbols block, which I guess probably isn't very high on the priority list at either Microsoft or Apple.

It's odd how priorities are assigned. Recycling symbols for 7 different kinds of plastics have a nice spot in the Miscellaneous Symbols block, but when are those ever used in actual text? Yet the double sharp and double flat symbols are off in the Musical Symbols block in Plane 1, which is unusable in most older software, even though these would be very useful to have in text (so much so that "x" and "bb" are used as substitutes for these unavailable characters). If the plastics industry can have their technical symbols added to the character list, it seems that there ought to be a place for Sagittal and any other notations with an established history of usage.

🔗Danny Wier <dawiertx@sbcglobal.net>

4/1/2007 7:59:14 PM

From: "Herman Miller" <hmiller@IO.COM>
To: <tuning@yahoogroups.com>
Sent: Sunday, April 01, 2007 6:38 PM
Subject: Re: [tuning] One rather off-topic contribution - The Unicode Character Database

> Danny Wier wrote:
...
>> But getting Sagittal encoded in Unicode would be a hard sell right now,
>> wouldn't it? It's going to have to become a widely-used notation before
>> they'll consider including it. They don't even have Tartini-Couper or
>> Arel-Ezgi accidentals encoded. They do have sharps, flats and accidentals
>> with arrows attached, and sharps and flats with a 4-like attachment--but
>> does anyone use these?
>
> It's strange that they picked such an obscure symbol for "quarter tone
> sharp" and "quarter tone flat", but a particular font could use a more
> familiar symbol for displaying those characters. I see they only have a
> single character for the quarter note rest, which is shown as the zigzag
> version in the chart, but there's another quarter note rest that's still
> occasionally seen (which looks like an eighth rest combined with a
> 180-degree rotated eighth rest, or roughly the letter Z), but it doesn't
> have a Unicode character assigned to it. So they might be considering
> all the different quarter tone symbols as glyph variants representing
> the same abstract "character".

True, they could do that, use the slots for quarter-tone sharp and flat generically, but it's not really in the spirit of Unicode. Case and point: Turkish and Romanian use very similar letters for the sound of "sh". The preferred form in Turkish is s-cedilla, but in Romanian it's s-comma-below. No language uses both letters, but they still are encoded in different locations, not as glyph variants.

In fact, both the "Arabic" stroke-flat and Tartini-Couper reversed flat, both types of quarter-tone flat, are used in the Arel-Ezgi system for two different sizes of flat, so they can't be glyph variants anyway.

I doubt I'd have the Z-type eighth rest encoded separately from the 7-type one, however. The printed Cyrillic lowercase T and the written one (which resembles a small Latin 'm') aren't, and shouldn't be.

> It's odd how priorities are assigned. Recycling symbols for 7 different
> kinds of plastics have a nice spot in the Miscellaneous Symbols block,
> but when are those ever used in actual text? Yet the double sharp and
> double flat symbols are off in the Musical Symbols block in Plane 1,
> which is unusable in most older software, even though these would be
> very useful to have in text (so much so that "x" and "bb" are used as
> substitutes for these unavailable characters). If the plastics industry
> can have their technical symbols added to the character list, it seems
> that there ought to be a place for Sagittal and any other notations with
> an established history of usage.

For what it's worth, Plane 1 symbols can be viewed in Internet Explorer 7. I wish I knew if Firefox could.

But I am thinking. If all else fails, what about encoding Sagittal in the ConScript Unicode Registry, or is it not for that purpose?

~D.

🔗Graham Breed <gbreed@gmail.com>

4/1/2007 8:54:49 PM

Danny Wier wrote:

> True, they could do that, use the slots for quarter-tone sharp and flat > generically, but it's not really in the spirit of Unicode. Case and point: > Turkish and Romanian use very similar letters for the sound of "sh". The > preferred form in Turkish is s-cedilla, but in Romanian it's s-comma-below. > No language uses both letters, but they still are encoded in different > locations, not as glyph variants.

I don't think there is a consistent "spirit of Unicode". Rather, different compromises are made in different situations. There's certainly a lot of griping about unified CJK characters. And it turns out that you can't design a single font to satisfy all readers of Arabic. The general spirit of character sets, though, is that you're supposed to leave the appearance to the fonts. I don't think we should take the example quartertone symbols at all seriously.

> In fact, both the "Arabic" stroke-flat and Tartini-Couper reversed flat, > both types of quarter-tone flat, are used in the Arel-Ezgi system for two > different sizes of flat, so they can't be glyph variants anyway.

There's "flat down" and "flat up" as well, whatever they're supposed to mean.

> I doubt I'd have the Z-type eighth rest encoded separately from the 7-type > one, however. The printed Cyrillic lowercase T and the written one (which > resembles a small Latin 'm') aren't, and shouldn't be.

I'm sure we wouldn't have done it this way.

>>It's odd how priorities are assigned. Recycling symbols for 7 different
>>kinds of plastics have a nice spot in the Miscellaneous Symbols block,
>>but when are those ever used in actual text? Yet the double sharp and
>>double flat symbols are off in the Musical Symbols block in Plane 1,
>>which is unusable in most older software, even though these would be
>>very useful to have in text (so much so that "x" and "bb" are used as
>>substitutes for these unavailable characters). If the plastics industry
>>can have their technical symbols added to the character list, it seems
>>that there ought to be a place for Sagittal and any other notations with
>>an established history of usage.
> > For what it's worth, Plane 1 symbols can be viewed in Internet Explorer 7. I > wish I knew if Firefox could.

It can.

http://www.i18nguy.com/unicode/unicode-example-intro.html

Internet Explorer 5 works if you encode them the right way. Of course, whatever the browser, you won't see anything if you don't have the right fonts.

> But I am thinking. If all else fails, what about encoding Sagittal in the > ConScript Unicode Registry, or is it not for that purpose?

http://www.evertype.com/standards/csur/

Looks like a perfect fit for Sagittal! I see Herman Miller's got a load of them as well.

Graham

🔗Herman Miller <hmiller@IO.COM>

4/1/2007 8:55:57 PM

Danny Wier wrote:

> For what it's worth, Plane 1 symbols can be viewed in Internet Explorer 7. I > wish I knew if Firefox could.
> > But I am thinking. If all else fails, what about encoding Sagittal in the > ConScript Unicode Registry, or is it not for that purpose?

The Gothic Wikipedia page (http://got.wikipedia.org/) is a good test for Plane 1 support (you'll need to install a Gothic font, but the page has links to fonts). It works in the latest version of Firefox under Windows XP (more or less).

I don't know if the ConScript Unicode Registry is still being maintained, but the range of Unicode characters it uses is in the Private Use Area, and there's no reason we couldn't come up with our own standards for assigning Private Use characters for microtonal notation symbols (including historical ones).