IDN support #85

Johann150 · 2021-11-07T20:41:13Z

IDN (International Domain Name) is not yet implemented in parsing domain names.

However, there is the concern that it can break existing posts for example the following could be parsed incorrectly:

@[email protected]ありがとう

An idea would be to recognize IDNs only if there is a space after it, but it might not always work.

I am not sure how to best solve this, but I think not implementing IDNs is a very inelegant solution that needs to be fixed. I have already seen a few people that were irritated by the missing proper support for IDNs.

The text was updated successfully, but these errors were encountered:

Johann150 · 2021-11-07T20:43:42Z

see also misskey-dev/misskey#5826

marihachi · 2021-11-09T15:48:03Z

今の仕様ではメンションのホスト名には非ASCII文字は直接使用できません。
非ASCII文字を表現するにはPunycodeに変換することが必要になります。

ホスト名の部分をUnicodeでも記述できるように変更するということですかね？

Johann150 · 2021-11-10T18:23:43Z

Yes, when parsing Unicode should be understood. But I think it would make sense if the output contains punycoded domains if necessary.
So, for example:

@somebody@みすきー.テスト

⇓

MENTION('somebody', 'xn--w8jxa7itv.xn--zckzah', '@somebody@みすきー.テスト')

Johann150 · 2021-11-10T21:22:11Z

I looked to Mastodon how they are doing it, and these are the regular expressions they use to recognize a mention:

USERNAME_RE   = /[a-z0-9_]+([a-z0-9_\.-]+[a-z0-9_]+)?/i
MENTION_RE    = /(?<=^|[^\/[:word:]])@((#{USERNAME_RE})(?:@[[:word:]\.\-]+[[:word:]]+)?)/i

https://github.com/mastodon/mastodon/blob/1114935e6486caaae6e4ba98b51ab803317acb03/app/models/account.rb#L61-L62

Since pegjs does not support \p{L} or \p{N} which would be needed to represent the same meaning as [:word:] in Ruby, it might be simpler to handle mentions with IDNs before the parser starts and convert them into punycoded domains. Then the parser itself would not have to be changed.

Mastodon's regular expression for mentions translates to Javascript as (replacing [:word:] with the Javascript equivalent \p{L}\p{N}_, adding the u Unicode flag and removing unnecessary parentheses)

/(?<=^|[^\/\p{L}\p{N}_])@[a-z0-9_]+(?:[a-z0-9_\.-]+[a-z0-9_]+)?(?:@[\p{L}\p{N}_\.-]+[\p{L}\p{N}_]+)?/iu

marihachi · 2021-11-13T09:08:37Z

https://github.com/mathiasbynens/idn-allowed-code-points-regex
IDNAの実装見つけた

Johann150 mentioned this issue Nov 11, 2021

implement IDN mentions #86

Closed

This was referenced Nov 13, 2021

Question: IDNドメインの取り扱い misskey-dev/misskey#5826

Open

IDN support misskey-dev/misskey.js#32

Open

Johann150 added the enhancement New feature or request label Feb 5, 2022

marihachi added Feature and removed enhancement New feature or request labels Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDN support #85

IDN support #85

Johann150 commented Nov 7, 2021

Johann150 commented Nov 7, 2021

marihachi commented Nov 9, 2021 •

edited

Loading

Johann150 commented Nov 10, 2021

Johann150 commented Nov 10, 2021

marihachi commented Nov 13, 2021

IDN support #85

IDN support #85

Comments

Johann150 commented Nov 7, 2021

Johann150 commented Nov 7, 2021

marihachi commented Nov 9, 2021 • edited Loading

Johann150 commented Nov 10, 2021

Johann150 commented Nov 10, 2021

marihachi commented Nov 13, 2021

marihachi commented Nov 9, 2021 •

edited

Loading