improve Unicode support #2

dpk · 2015-10-28T08:55:54Z

Use FontForge's “build accented glyph” feature to fill in gaps in the Unicode repertoire. There should be enough character+glyph combinations possible for most European languages …

igalic · 2019-04-30T12:40:00Z

coming here from edwardtufte#22
I really badly miss my ć…

dpk · 2021-03-01T11:06:13Z

Here’s an initial assessment (based on running PyFontaine on the Roman font and reading through the first, like, 5% of its extremely voluminous output) of which characters would be easiest to work on, and which would give the most benefit in terms of number of languages (but alas, not number of speakers) supported. Where PyFontaine only picked up a capital or small letter as missing, it’s safe to say it would be a good idea to have both if we don’t already.

Edit: Superseded (but see the HTML comments if you’re interested), see below.

dpk · 2021-03-02T13:58:54Z

We can use <component /> elements in the UFO glyph outline to build accented characters out of base characters + glyphs, like as follows:

<?xml version='1.0' encoding='UTF-8'?>
<glyph name="Adieresis" format="2">
  <advance width="414"/>
  <outline>
    <component base="A" />
    <component base="dieresis" xOffset="180" yOffset="222" />
  </outline>
</glyph>

There’s already an Adieresis character in the fonts, of course, this is just an example. But I used it to test where the diacritics should go relative to the base font. The values of 180 and 222 for xOffset and yOffset make the dieresis appear exactly where it does above the A as in the Roman font’s real, original Adieresis character (tested by layering a blue version of this character directly over a red version of the original Adieresis). Further, when I change dieresis to macron or caron or tilde, etc., it appears to put the accent marks in the right place, horizontally centred over the base character (because the accent characters are all the same width).

For the lower-case adieresis, building a + dieresis with xOffset and yOffset both 0 (the default) matches the original, but this is almost certainly not true for all base characters.

dpk · 2021-03-02T14:42:54Z

With the above in mind, here are the correct xOffset and yOffset values for characters built from the following bases plus accents above the character in question, calculated using dieresis as the combining character except where noted (and therefore possibly incorrect for others, but this will eventually be checked and noted):

Edit: Superseded, see doc/accent-positions.md

I wish I had a quicker way of finding them out …

dpk · 2021-03-02T14:45:00Z

I guess the benefit is, once I have the offset values for all fonts, I can write a script that will generate the glif files for any combination on demand automatically …

Also need to find good placements for the accent marks over characters like æ, œ, r, and w, for which some languages need accent mark versions, but for which there are no accented versions in the original ET Books fonts. There are also no accented small-cap s or z characters.

The lowercase values likely work for some accents under the letter as well: one can, at least, build an extremely passable s-cedilla out of s + cedilla with the values given above. c + cedilla (offset 33, -10) doesn’t quite match ccedilla, but offset 33, 0 for c + acute looks reasonable.

a + ogonek looks okay-ish with ogonek at offset 155, -10 ish. (I’m assuming -10 as the yOffset for all accents like cedilla which hang from, and are attached, under the character in question.) I suspect we’re not going to get better than okay-ish unless someone felt like coming in and designing an original aogonek character. Also, as I don’t read any languages which use the ogonek, I’m not really qualified to judge how good it looks in practice — I’m just comparing at large scale to a couple of serif fonts I have.

dpk · 2021-03-02T15:44:55Z

As a goal for what languages to support, it would be nice to support all the Latin-scripted languages of the European Union (that is, all of them except Greek and Bulgarian).

dpk · 2021-03-05T11:49:37Z

Okay, after some moderately successful bodging of characters in FontForge today, I think I’m ready to upgrade what I hope can be achieved from ‘Latin-scripted languages of the European Union’ to also include ‘Latin-scripted languages promoted by the European Charter for Regional or Minority languages’. Here’s a quick overview of what characters are needed, according to PyFontaine.

Support for any individual character is likely to come to the Roman font only first, then to the bold weights, then to italic, then only maybe to Display Italic. (I haven’t decided yet whether I’ll even keep maintaining Display Italic.)

Languages of the European Union

Danish, English, Estonian, Finnish, French, German, Irish, Italian, Portuguese, Spanish, Swedish

Already fully supported.

Bulgarian

Will not be supported — uses Cyrillic script.

Croatian

Č č Ć ć Đ đ

Czech

Č č Ď ď Ě ě Ň ň Ř ř Ť ť Ů ů

Dutch

Ĳ ĳ

It should be okay to specify these characters by OpenType positioning, I hope.

Greek

Will not be supported — uses Greek script.

Hungarian

Ő ő Ű ű

Latvian

Ā ā Č č Ē ē Ģ ģ Ī ī Ķ ķ ļ Ņ ņ Ū ū

Lithuanian

Ą ą Č č Ė ė Ę ę Į į Ū ū Ų ų

Maltese

Ċ ċ Ġ ġ Ħ ħ Ż ż

Polish

Ą ą Ć ć Ę ę Ń ń Ś ś Ź ź Ż ż

Romanian

Ă ă Ș ș Ț ț

Slovak

Č č Ď ď Ĺ ĺ Ľ ľ Ň ň Ŕ ŕ Ť ť

Slovene/Slovenian

Č č

European Minority/Regional Languages (which are not also EU languages)

All those not mentioned should either already be fully supported, or non-Latin, or (rarely) should be automatically covered when other, related languages are covered.

Assyrian

Will not be supported — uses Syriac script.

Arabic

Will not be supported — uses Arabic script.

Armenian

Will not be supported — uses Armenian script.

Belarusian

Will not be supported — uses Cyrillic script.

Bosnian

See Croatian.

Catalan

Also Valencian.

Ŀ ŀ

Gagauz

See Turkish.

Karaim

Most of these characters are pretty tricky. Maybe give up and don’t support this one.

Ė ė Ƣ ƣ Ꞑ ꞑ Ɵ ɵ Ś ś Ş ş Ь ь Ž ž Ź ź Ƶ ƶ

Kashubian

Ą ą Ã ã Ń ń Ż ż

Kurdish

Ş ş

Kven, Limburgish

Probably already supported.

Macedonian, Moldovan

Will not be supported — use Cyrillic script.

Romani

Č č

Russian

Will not be supported — uses Cyrillic script.

Rusyn

See Slovakian.

Scandoromani languages

Probably supported when we support all the other characters.

Sami (all dialects)

Č č Đ đ Ǧ ǧ Ǥ ǥ Ǩ ǩ Ŋ ŋ Ŧ ŧ Ʒ ǯ Ǯ ʒ ʹ

Sorbian, Upper

Ć ć Č č Ě ě Ń ń Ř ř

Sorbian, Lower

Ć ć Č č Ě ě Ń ń Ŕ ŕ Ś ś Ź ź

Tatar

Looks complicated due to multiple competing orthographies with unclear legal statuses.

Turkish

Ğ ğ İ Ş ş

Ukrainian

Will not be supported — uses Cyrillic script.

Welsh

Ŵ ŵ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ŷ ŷ Ỳ ỳ

Yezidi (Kurmanji)

See Kurdish.

Yiddish

Will not be supported — uses Hebrew script.

dpk · 2021-03-05T11:53:45Z

Char.	Langs
č	10
Č	10
ń	4
Ń	4
ć	4
Ć	4
ż	3
Ż	3
ě	3
Ě	3
ą	3
Ą	3
ź	2
Ź	2
ū	2
Ū	2
ť	2
Ť	2
ş	2
Ş	2
ś	2
Ś	2
ř	2
Ř	2
ŕ	2
Ŕ	2
ň	2
Ň	2
ę	2
Ę	2
đ	2
Đ	2
ď	2
Ď	2
ỳ	1
Ỳ	1
ẅ	1
Ẅ	1
ẃ	1
Ẃ	1
ẁ	1
Ẁ	1
ʹ	1
ʒ	1
ț	1
Ț	1
ș	1
Ș	1
ǯ	1
Ǯ	1
ǩ	1
Ǩ	1
ǧ	1
Ǧ	1
ǥ	1
Ǥ	1
Ʒ	1
ŷ	1
Ŷ	1
ŵ	1
Ŵ	1
ų	1
Ų	1
ű	1
Ű	1
ů	1
Ů	1
ŧ	1
Ŧ	1
ő	1
Ő	1
ŋ	1
Ŋ	1
ņ	1
Ņ	1
ŀ	1
Ŀ	1
ľ	1
Ľ	1
ļ	1
ĺ	1
Ĺ	1
ķ	1
Ķ	1
ĳ	1
Ĳ	1
İ	1
į	1
Į	1
ī	1
Ī	1
ħ	1
Ħ	1
ģ	1
Ģ	1
ġ	1
Ġ	1
ğ	1
Ğ	1
ė	1
Ė	1
ē	1
Ē	1
ċ	1
Ċ	1
ă	1
Ă	1
ā	1
Ā	1
ã	1
Ã	1

igalic · 2021-03-05T15:45:26Z

Croatian

Č č Ć ć Đ đ

you're missing

Š š Ž ž

(i think Slovene should have the same set, but i'm not familiar with it. even though it's a South Slavic language, to my ears it sounds like West Slavic language)

Turkish

Ğ ğ İ Ş ş

you're missing

Ö ö Ü ü

this throws your count off

dpk · 2021-03-05T18:29:28Z

All those characters are already in the font — I’m only counting ones that aren’t there already. Thanks for double checking!

igalic mentioned this issue May 1, 2019

Add support for Polish language diacritics to ET Book font edwardtufte/et-book#22

Open

dpk added this to the 2.0 milestone Mar 1, 2021

This was referenced Mar 5, 2021

UFO/OpenType conversion edwardtufte/et-book#25

Open

Double dagger ‡ got clobbered somehow #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve Unicode support #2

improve Unicode support #2

dpk commented Oct 28, 2015

igalic commented Apr 30, 2019

dpk commented Mar 1, 2021 •

edited

Loading

dpk commented Mar 2, 2021

dpk commented Mar 2, 2021 •

edited

Loading

dpk commented Mar 2, 2021 •

edited

Loading

dpk commented Mar 2, 2021

dpk commented Mar 5, 2021

dpk commented Mar 5, 2021

igalic commented Mar 5, 2021 •

edited

Loading

Croatian

Turkish

dpk commented Mar 5, 2021 •

edited

Loading

improve Unicode support #2

improve Unicode support #2

Comments

dpk commented Oct 28, 2015

igalic commented Apr 30, 2019

dpk commented Mar 1, 2021 • edited Loading

dpk commented Mar 2, 2021

dpk commented Mar 2, 2021 • edited Loading

dpk commented Mar 2, 2021 • edited Loading

dpk commented Mar 2, 2021

dpk commented Mar 5, 2021

Languages of the European Union

Danish, English, Estonian, Finnish, French, German, Irish, Italian, Portuguese, Spanish, Swedish

Bulgarian

Croatian

Czech

Dutch

Greek

Hungarian

Latvian

Lithuanian

Maltese

Polish

Romanian

Slovak

Slovene/Slovenian

European Minority/Regional Languages (which are not also EU languages)

Assyrian

Arabic

Armenian

Belarusian

Bosnian

Catalan

Gagauz

Karaim

Kashubian

Kurdish

Kven, Limburgish

Macedonian, Moldovan

Romani

Russian

Rusyn

Scandoromani languages

Sami (all dialects)

Sorbian, Upper

Sorbian, Lower

Tatar

Turkish

Ukrainian

Welsh

Yezidi (Kurmanji)

Yiddish

dpk commented Mar 5, 2021

igalic commented Mar 5, 2021 • edited Loading

Croatian

Turkish

dpk commented Mar 5, 2021 • edited Loading

dpk commented Mar 1, 2021 •

edited

Loading

dpk commented Mar 2, 2021 •

edited

Loading

dpk commented Mar 2, 2021 •

edited

Loading

igalic commented Mar 5, 2021 •

edited

Loading

dpk commented Mar 5, 2021 •

edited

Loading