You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are strings encoded in UTF-8, UTF-16, UCS-2, or what? For example, how would the String value "𐀀" (U+10000) be encoded in the various formats? (That is, what character encoding is used for Transit when encoding JSON to bytes?)
(Edit: Removed further question about illegal bytes in raw-string inputs in various languages.)
(Sorry for the thrashing -- I'm reopening this now that I see that yes, Github Issues are in fact being used for this project.)
The text was updated successfully, but these errors were encountered:
The question is actually: "is there a default encoding of the document or is this negotiable?" In practice, is it nailed to be UTF-8 (in which case you need surrogate pairs to represent U+10000 and on outside the BMP), or is it something else.
There are actually two questions, one of which I think is already answered:
When a String value's characters are expressed in JSON, what encoding is used? (Answer from spec: No encoding needed for non-ASCII, but UTF-16 is to be used for any unicode character escapes.)
When the JSON is then written to a byte-oriented medium, what encoding is used for the character->bytes conversion?
The JSON spec suggests using UTF-8, but it doesn't demand it. I think it would be appropriate for Transit to lock this down so that we don't get nasty character encoding issues between platforms with different system defaults (e.g. Windows-1252 in Windows with English locales.)
(As for MessagePack, a quick glance suggests it already specifies an encoding of UTF-8.)
Are strings encoded in UTF-8, UTF-16, UCS-2, or what? For example, how would the String value
"𐀀"
(U+10000) be encoded in the various formats? (That is, what character encoding is used for Transit when encoding JSON to bytes?)(Edit: Removed further question about illegal bytes in raw-string inputs in various languages.)
(Sorry for the thrashing -- I'm reopening this now that I see that yes, Github Issues are in fact being used for this project.)
The text was updated successfully, but these errors were encountered: