Skip to content
This repository has been archived by the owner on Mar 27, 2019. It is now read-only.

Character encodings #7

Open
ezzatron opened this issue Dec 8, 2015 · 0 comments
Open

Character encodings #7

ezzatron opened this issue Dec 8, 2015 · 0 comments

Comments

@ezzatron
Copy link
Collaborator

ezzatron commented Dec 8, 2015

This library currently handles conversion between strings and sets of codepoints, in an attempt to provide an intuitive and easy to use API. It may be a better idea to require the codepoint conversion to take place before input to this library.

This would allow for systems using a third-party UTF-8 implementation such as utf8. It also neatly avoids the issue of what encodings PRECIS deems valid. For example, from RFC 7613:

An entity that prepares a string according to this profile MUST first
map fullwidth and halfwidth characters to their decomposition
mappings (see Unicode Standard Annex #11 [UAX11]). This is necessary
because the PRECIS "HasCompat" category specified in Section 9.17 of
[RFC7564] would otherwise forbid fullwidth and halfwidth characters.
After applying this width-mapping rule, the entity then MUST ensure
that the string consists only of Unicode code points that conform to
the PRECIS IdentifierClass defined in Section 4.2 of [RFC7564]. In
addition, the entity then MUST encode the string as UTF-8 [RFC3629].

(emphasis mine)

See discussion under #1 for more information.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant