Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target two-character string modification #14

Closed
wants to merge 1 commit into from

Conversation

zyk-mjzs
Copy link
Contributor

@zyk-mjzs zyk-mjzs commented Apr 24, 2024

For example, String::from_utf16([55357, 56447]) can successfully transcribe the unicode of a string like 👿, but String::from_utf16([55357]) will report an error FromUtf16Error
If possible, I hope that consecutive unicodes can be processed uniformly, so that similar errors can probably be avoided.
But I don’t know how to process strings composed of multiple unicodes, so the code modified by PR this time is not good.

@d0rianb
Copy link
Owner

d0rianb commented Apr 26, 2024

In the case of consecutive unicode that should be merged, a previous instruction should be provided : \uc{n} which tells the parser how many of the following unicode should be ignore.
I'm considering rewriting the unicode parsing, to improve perf & reliability.

@d0rianb d0rianb added the WIP label Apr 26, 2024
@d0rianb
Copy link
Owner

d0rianb commented Apr 27, 2024

Moreover, it seems that TextEdit is not doing unicode the right way. According to the specification, any unicode > 32767 should be expressed as negative (value - 65535). But the way TextEdit encode the 👿 emoji is : \uc0 \u55357 \u56447.
Word encodes it as : \u-10179\'5f\u-9089\'5f (\uc1 is implicit), which follows the specification and corresponds to the same unicode.
But I can't figure out what is the rule to decide if these characters should be merged.

A good description on how unicode works is detailed in this comment .

@d0rianb
Copy link
Owner

d0rianb commented Apr 28, 2024

Fixed in 5c80b43.

@d0rianb d0rianb closed this Apr 28, 2024
@d0rianb d0rianb removed the WIP label Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants