-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Indic] Sinhala base consonant search (spec/Uniscribe vs. HarfBuzz) #123
Comments
So I've been trying to get my head around this (as Pathum has noted, the upstream specs are extremely underspecified in some places; undoubtedly the most recent MS docs are better than they used to be, but they still don't give anywhere near as much detail in Sinhala as they do in the multi-script Indic2 spec). It does seem like U+0DBA takes on post-base form via I mean, either way, the base-search text ought to be clearer. But explicitly saying "skip consonants that take on post-base form" would be in line with what we say in the Indic2 docs, and that is an easier solution than figuring out why there's an unexpected mismatch in how the searches terminate. |
Did you mean If it is |
Agreed, and a quick test shows that this appears to be how DirectWrite handles it. Using a variant of the example in the original post:
gives us: My interpretation of this output is that U+0DBA takes on post-base form, but U+0D9C doesn't despite being preceded by a ZWJ, therefore U+0D9C is the base (indicated by the U+0DD9 matra moving before it). (HarfBuzz still places the matra at the start of the syllable.) Nice insight/intuition, @n8willis! |
HarfBuzz adopts a slightly different approach to Uniscribe / this spec. Consider the syllable (taken from our corpus):
Following this spec, we start at the end of the syllable until we find the consonant
U+0DBA SINHALA LETTER YAYANNA
. It is not immediately preceded by a ZWJ, therefore it is the base.The left matra
U+0DD9 SINHALA VOWEL SIGN KOMBUVA
(via decomposition ofU+0DDA SINHALA VOWEL SIGN DIGA KOMBUVA
) then moves up prior to this base, giving us:From a quick read of the HarfBuzz source code, what it appears to be doing is starting at the beginning of the syllable and taking the last consonant that is not immediately preceded by a ZWJ.
Therefore,
U+0D9A SINHALA LETTER ALPAPRAANA KAYANNA
is the base, as the base consonant search is subsequently terminated on encountering theU+200D ZWJ, U+0D9C GAYANNA
pair. This gives us:The text was updated successfully, but these errors were encountered: