-
Notifications
You must be signed in to change notification settings - Fork 14
Encoding examples
Tom Honermann edited this page Jul 2, 2017
·
2 revisions
Text_view output iterators allow encoding a code unit sequence by writing characters (a class object with an associated character set and code point value) via an iterator. For example:
using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0x00F8}; // Encodes U+00F8 as \xC3\xB8.
assert(s[0] == '\xC3');
assert(s[1] == '\xB8');
By default, exceptions are thrown when errors occur during encoding operations.
using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0xD800}; // UTF-16 high surrogate; not a valid character. throws text_encode_error.
Text_view's error policies allow creating output iterators that substitute a character set specific substitution character when errors are encountered during an encoding operation. For example:
using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0xD800}; // UTF-16 high surrogate; not a valid character.
assert(s[0] == '\xEF'); // The UTF-8 code unit sequence 0xEF 0xBF 0xBD is the
assert(s[1] == '\xBF'); // encoding of U+FFFD, the Unicode substitution character.
assert(s[2] == '\xBD');