-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve Unicode support #2
Comments
coming here from edwardtufte#22 |
Here’s an initial assessment (based on running PyFontaine on the Roman font and reading through the first, like, 5% of its extremely voluminous output) of which characters would be easiest to work on, and which would give the most benefit in terms of number of languages (but alas, not number of speakers) supported. Where PyFontaine only picked up a capital or small letter as missing, it’s safe to say it would be a good idea to have both if we don’t already. Edit: Superseded (but see the HTML comments if you’re interested), see below. |
We can use <?xml version='1.0' encoding='UTF-8'?>
<glyph name="Adieresis" format="2">
<advance width="414"/>
<outline>
<component base="A" />
<component base="dieresis" xOffset="180" yOffset="222" />
</outline>
</glyph> There’s already an Adieresis character in the fonts, of course, this is just an example. But I used it to test where the diacritics should go relative to the base font. The values of 180 and 222 for For the lower-case adieresis, building a + dieresis with xOffset and yOffset both 0 (the default) matches the original, but this is almost certainly not true for all base characters. |
With the above in mind, here are the correct Edit: Superseded, see doc/accent-positions.md I wish I had a quicker way of finding them out … |
I guess the benefit is, once I have the offset values for all fonts, I can write a script that will generate the Also need to find good placements for the accent marks over characters like æ, œ, r, and w, for which some languages need accent mark versions, but for which there are no accented versions in the original ET Books fonts. There are also no accented small-cap s or z characters. The lowercase values likely work for some accents under the letter as well: one can, at least, build an extremely passable s-cedilla out of s + cedilla with the values given above. c + cedilla (offset 33, -10) doesn’t quite match ccedilla, but offset 33, 0 for c + acute looks reasonable. a + ogonek looks okay-ish with ogonek at offset 155, -10 ish. (I’m assuming -10 as the |
As a goal for what languages to support, it would be nice to support all the Latin-scripted languages of the European Union (that is, all of them except Greek and Bulgarian). |
Okay, after some moderately successful bodging of characters in FontForge today, I think I’m ready to upgrade what I hope can be achieved from ‘Latin-scripted languages of the European Union’ to also include ‘Latin-scripted languages promoted by the European Charter for Regional or Minority languages’. Here’s a quick overview of what characters are needed, according to PyFontaine. Support for any individual character is likely to come to the Roman font only first, then to the bold weights, then to italic, then only maybe to Display Italic. (I haven’t decided yet whether I’ll even keep maintaining Display Italic.) Languages of the European UnionDanish, English, Estonian, Finnish, French, German, Irish, Italian, Portuguese, Spanish, SwedishAlready fully supported. BulgarianWill not be supported — uses Cyrillic script. CroatianČ č Ć ć Đ đ CzechČ č Ď ď Ě ě Ň ň Ř ř Ť ť Ů ů DutchIJ ij It should be okay to specify these characters by OpenType positioning, I hope. GreekWill not be supported — uses Greek script. HungarianŐ ő Ű ű LatvianĀ ā Č č Ē ē Ģ ģ Ī ī Ķ ķ ļ Ņ ņ Ū ū LithuanianĄ ą Č č Ė ė Ę ę Į į Ū ū Ų ų MalteseĊ ċ Ġ ġ Ħ ħ Ż ż PolishĄ ą Ć ć Ę ę Ń ń Ś ś Ź ź Ż ż RomanianĂ ă Ș ș Ț ț SlovakČ č Ď ď Ĺ ĺ Ľ ľ Ň ň Ŕ ŕ Ť ť Slovene/SlovenianČ č European Minority/Regional Languages (which are not also EU languages)All those not mentioned should either already be fully supported, or non-Latin, or (rarely) should be automatically covered when other, related languages are covered. AssyrianWill not be supported — uses Syriac script. ArabicWill not be supported — uses Arabic script. ArmenianWill not be supported — uses Armenian script. BelarusianWill not be supported — uses Cyrillic script. BosnianSee Croatian. CatalanAlso Valencian. Ŀ ŀ GagauzSee Turkish. KaraimMost of these characters are pretty tricky. Maybe give up and don’t support this one. Ė ė Ƣ ƣ Ꞑ ꞑ Ɵ ɵ Ś ś Ş ş Ь ь Ž ž Ź ź Ƶ ƶ KashubianĄ ą Ã ã Ń ń Ż ż KurdishŞ ş Kven, LimburgishProbably already supported. Macedonian, MoldovanWill not be supported — use Cyrillic script. RomaniČ č RussianWill not be supported — uses Cyrillic script. RusynSee Slovakian. Scandoromani languagesProbably supported when we support all the other characters. Sami (all dialects)Č č Đ đ Ǧ ǧ Ǥ ǥ Ǩ ǩ Ŋ ŋ Ŧ ŧ Ʒ ǯ Ǯ ʒ ʹ Sorbian, UpperĆ ć Č č Ě ě Ń ń Ř ř Sorbian, LowerĆ ć Č č Ě ě Ń ń Ŕ ŕ Ś ś Ź ź TatarLooks complicated due to multiple competing orthographies with unclear legal statuses. TurkishĞ ğ İ Ş ş UkrainianWill not be supported — uses Cyrillic script. WelshŴ ŵ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ŷ ŷ Ỳ ỳ Yezidi (Kurmanji)See Kurdish. YiddishWill not be supported — uses Hebrew script. |
|
you're missing Š š Ž ž (i think Slovene should have the same set, but i'm not familiar with it. even though it's a South Slavic language, to my ears it sounds like West Slavic language)
you're missing Ö ö Ü ü this throws your count off |
All those characters are already in the font — I’m only counting ones that aren’t there already. Thanks for double checking! |
Use FontForge's “build accented glyph” feature to fill in gaps in the Unicode repertoire. There should be enough character+glyph combinations possible for most European languages …
The text was updated successfully, but these errors were encountered: