Fixes for Sinhala and Tamil visual normalizations. #357
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes for Sinhala and Tamil visual normalizations.
Sinhala
Listed rewrite is about the meaning of the sequence
<ZWJ, VIRAMA, ZWJ>
. In Sinhala, both the sequences<ZWJ, VIRAMA>
and<VIRAMA, ZWJ>
can represent different cluster formations. See Unicode Standard 15.0.0 Chapter 13.2 Table 13-2 Page 538. We are making the determination here that the intended form is the reduced consonant form Rakaaraansaya, like Ra-phalaa.In some cases, the vowel sign must be moved to the end of the cluster with Rakaaraansaya. The listed Rakaaraansaya sequence rewrites are not exhaustive; need to include them as they are discovered in the wild.
Tamil
Currently,
preserve
is only available when the listed sequence is sandwiched between two consonants. However, in Tamil, only a few clusters require preserving<VIRAMA, ZWNJ>
, such as the K-SSA cluster. Unfortunately, this is not possible to represent in Nisaba today. As a result,<VIRAMA, ZWNJ>
is unnecessarily preserved in all consonant contexts.