Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode;UTF-16;代理对 #75

Open
gogoend opened this issue May 4, 2022 · 1 comment
Open

Unicode;UTF-16;代理对 #75

gogoend opened this issue May 4, 2022 · 1 comment
Labels
5% 写作进度 5% es

Comments

@gogoend
Copy link
Owner

gogoend commented May 4, 2022

某些仅有一个字符的字符串,长度居然不为1?

'😂'.length // 2

'𠮷'.length // 2

'✝️'.length // 2

'9️⃣'.length // 3

'👨‍👩‍👧‍👦'.length // 11

代理对:码点超过0xFFFF的字符需要进行转换,以使用多个码点来保存这个字符 —— 因为UTF-16下,单独一个码点存不下这个字符。

Unicode 标准规定 U+D800...U+DFFF 的值不对应于任何字符,所以可以用来做标记。

https://juejin.cn/post/7025400771982131236

看起来增补平面可以表示(2**20)-1个字符;使用代理对表示,可以恰好覆盖增补平面所有字符

增补平面码点到代理对转换

H = Math.floor((C - 0x10000) / 0x400) + 0xD800
L = (C - 0x10000) % 0x400 + 0xDC00

代理对转换到增补平面码点

C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000

字符变形

'🈚︎'.split('').map(char => char.charCodeAt().toString(16))
// (3) ['d83c', 'de1a', 'fe0e']
'🈚️'.split('').map(char => char.charCodeAt().toString(16))
// (3) ['d83c', 'de1a', 'fe0f']
@gogoend gogoend added es 5% 写作进度 5% labels May 4, 2022
@gogoend
Copy link
Owner Author

gogoend commented Dec 18, 2023

wow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5% 写作进度 5% es
Projects
None yet
Development

No branches or pull requests

1 participant