Skip to content

Latest commit

 

History

History
339 lines (299 loc) · 18 KB

README.md

File metadata and controls

339 lines (299 loc) · 18 KB

WG21 SG16 Unicode study group

SG16 is an ISO/IEC JTC1/SC22/WG21 C++ study group tasked with improving Unicode and text processing support within the C++ standard.

If you would like to contribute to the discussion, please subcribe to our mailing list at https://lists.isocpp.org/mailman/listinfo.cgi/sg16.

Meetings are generally held twice a month; invitations are sent to the mailing list. Summaries of past meetings are available at https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md.

A standing paper that describes our intended scope, directives, guidelines and constraints is available at P1238 - SG16: Unicode Direction. Anyone wanting to follow or contribute to SG16 should become familiar with it.

We also provide input on other proposals within WG21 and WG14 when those proposals touch on topics listed in P1253 - Guidelines for when a WG21 proposal should be reviewed by SG16.

The following sections list projects, Unicode papers, and ISO papers that fall under the purview of SG16.

Active Projects

Project Description/Links
Boost.Text What a c++ standard Unicode library might look like
Code repository
Documentation
ztd.text The premiere library for handling text in different encoding forms and reducing transcoding bugs in your C++ software
Code repository
Documentation
text_view A C++ Concepts based character encoding and code point enumeration library
Code repository

Unicode papers

Document Number Title/Notes/Links
L2/23-153 Opposition to and Comment on L2/23–107
L2/23-107 Proper Complex Script Support in Text Terminals
L2/21-038 Clarify guidance for use of a BOM as a UTF-8 encoding signature

ISO/IEC JTC1/SC2/WG2 (Unicode) Papers

Active Papers

WG21 Number Title/Notes/Links
WG2-N5168 Name aliases and UTF-16 encoding scheme are inconsistent with the Unicode Standard
Per WG2-N5175, WG2-N5174 contains the proposed resolution.
WG2-N5174 Proposed changes concerning Character Name Aliases in ISO/IEC 10646
This is the proposed resolution for WG2-N5168.

ISO/IEC JTC1/SC22/WG21 (C++) Papers

Active Papers

WG21 Number Title/Notes/Links
P3374 Adding formatter for fpos<mbstate_t>
P3364 Remove Deprecated u8path overloads From C++26
P3263 Encoding annotated char
P3258 Formatting of charN_t
P3154 Deprecating signed character types in iostreams
P3070 Formatting enums
P2873 Remove Deprecated Locale Category Facets For Unicode from C++26
P2758 Emitting messages at compile time
P2749 Down with ”character”
P2729 Unicode in the Library, Part 2: Normalization
P2728 Unicode in the Library, Part 1: UTF Transcoding
P2626 charN_t incremental adoption: Casting pointers of UTF character types
P2528 C++ Identifier Security using Unicode Standard Annex 39
P2348 Whitespaces Wording Revamp
P2319 Prevent path presentation problems
P1953 Unicode Identifiers And Reflection
P1729 Text Parsing
P1629 Standard Text Encoding
P1628 Unicode character properties
P1030 std::filesystem::path_view
P0244 Text_view: A C++ concepts and range based character encoding and code point enumeration library

Accepted C++26 Papers

WG21 Number Title/Notes/Links
P2909 Fix formatting of code units as integers
(Dude, where’s my char?)
P2872 Remove wstring_convert From C++26
P2871 Remove Deprecated Unicode Conversion Facets From C++26
P2845 Formatting of std::filesystem::path
P2741 user-generated static_assert messages
P2558 Add @, $, and ` to the basic character set
P2361 Unevaluated strings literals
P1885 Naming Text Encodings to Demystify Them
P1854 Conversion to execution encoding should not lead to loss of meaning

Accepted C++23 Papers

WG21 Number Title/Notes/Links
P2736 Referencing the Unicode Standard
P2713 Escaping improvements in std::format
P2693 Formatting thread::id and stacktrace
P2675 LWG3780: The Paper (format's width estimation is too approximate and not forward compatible)
P2653 Update Annex E based on Unicode 15.0 UAX 31
P2572 std::format() fill character allowances
P2513 char8_t Compatibility and Portability Fixes
P2460 Relax requirements on wchar_t to match existing practices
P2419 Clarify handling of encodings in localized formatting of chrono types
P2372 Fixing locale handling in chrono formatters
P2362 Remove non-encodable wide character literals and multicharacter wide character literals
P2316 Consistent character literal encoding
P2314 Character sets and encodings
P2295 Support for UTF-8 as a portable source file encoding
P2290 Delimited escapes sequences
P2246 Character encoding of diagnostic text
P2223 Trimming whitespaces before line splicing
P2201 Mixed string literal concatenation
P2093 Formatted output
P2071 Named universal character escapes
P2029 Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals
P1949 C++ Identifier Syntax using Unicode Standard Annex 31
P1072 basic_string::resize_and_overwrite

Accepted C++20 Papers

WG21 Number Title/Notes/Links
P1892 Extended locale-specific presentation specifiers for std::format
P1868 🦄 width: clarifying units of width and precision in std::format
P1423 char8_t backward compatibility remediation
P1139 Address wording issues related to ISO 10646
P1041 Make char16_t/char32_t string literals be UTF-16/32
P1025 Update The Reference To The Unicode Standard
P0645 Text Formatting
P0482 char8_t: A type for UTF-8 characters and strings

Inactive Papers

Inactive papers list

The following papers are no longer being pursued.

WG21 Number Title/Notes/Links
P2773 Considerations for Unicode algorithms
(This is an informational paper and was reviewed by SG16 in February and March of 2023)
P2498 Forward compatibility of text_encoding with additional encoding registries
(Dropped by the author following lack of consensus for a change in LEWG)
P2491 Text encodings follow-up
(The concerns raised in this paper were avoided by changes made in R10 of P1885)
P2297 Wording improvements for encodings and character sets
(The goals of this paper were mostly addressed via P2314)
P2194 The character set of C++ source code is Unicode
(The goals of this paper are now being pursued via P2314 and P2297)
P2178 Misc lexing and string handling improvements
(The goals of this paper are now being pursued via P1854, P2223, P2295, P2297, P2348, P2316, P2361, P2362, and P2460)
P2020 Locales, Encodings and Unicode
(This paper did not contain a concrete proposal and no revisions are expected; it will be used as reference material)
P1880 uNstring Arguments Shall Be UTF-N Encoded
(This proposal was withdrawn by the author upon determining that the complexity of the required wording updates would outweigh their benefits)
P1879 Please Don't Rewrite My String Literals
(This proposal was withdrawn by the author)
P1859 Standard terminology for execution character set encodings
(The goals of this proposal were accomplished via P2314)
P1844 Enhancement of regex
(Severe ABI concerns prevent updating std::regex. We will explore deprecating and replacing it)
P1097 Named character escapes
(Superseded by P2071)
P0353 Unicode Friendly Encoding Conversions for the Standard Library
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first)
P0169 regex with Unicode character types
(This proposal is not being advocated at this time; more foundational concerns need to be addressed first)

ISO/IEC JTC1/SC22/WG14 (C) Papers

Active Papers

WG14 Number Title/Notes/Links
N3366 Restartable Functions for Efficient Character Conversions, r13
(Previously N2431 (R0), N2440 (R1), N2500 (R2), N2595 (R3), N2620 (R4), N2730 (R5), N2902 (R6), N2966 (R7), N2999 (R8), N3031 (R9), N3075 (R10), N3095 (R11), N3265 (R12))
N3145 $ in Identifiers v2
(Previously N3046 (R0))
N3124 Aligning Universal Character Names Constraints with C++
N3095
N3016 Unicode Length Modifiers v3
N2948 Accessing the command line arguments outside of main()
N2932 C Identifier Security using Unicode Standard Annex 39 v2
(Previously N2916 (R0))
N2785 Delimited escapes sequences

Accepted C23 Papers

WG14 Number Title/Notes/Links
N2940 Removing trigraphs??!
N2939 Identifier Syntax Fixes
N2836 C Identifier Syntax using Unicode Standard Annex 31
(Previously N2777 (R0))
N2828 Unicode Sequences More Than 21 Bits are a Constraint Violation
N2728 char16_t & char32_t string literals shall be UTF-16 & UTF-32 | r0
N2701 @ and $ in source and execution character set
N2653 char8_t: A type for UTF-8 characters and strings (Revision 1)
(Previously N2231 (R0))
N2594 Mixed Wide String Literal Concatenation
N2563 Character encoding of diagnostic text
N2418 Adding the u8 character prefix
(Previously N2198 (R0))

Inactive Papers

Inactive papers list
WG14 Number Title/Notes/Links
N3265 Restartable Functions for Efficient Character Conversions | r12
(Superseded by N3366)
N3095 Restartable Functions for Efficient Character Conversions | r11
(Superseded by N3265)
N3075 Restartable Functions for Efficient Character Conversions | r10
(Superseded by N3095)
N3046 $ in Identifiers
(Superseded by N3145)
N3031 Restartable Functions for Efficient Character Conversions | r9
(Superseded by N3075)
N2999 Restartable for Efficient Character Conversions | r8
(Superseded by N3031)
N2983 Unicode Length Modifiers v2
(Superseded by N3016)
N2966 Restartable Functions for Efficient Character Conversions | r7
(Superseded by N2999)
N2916 C Identifier Security using Unicode Standard Annex 39
Superseded by N2932)
N2902 Restartable and Non-Restartable Functions for Efficient Character Conversions | r6
(Superseded by N2966)
N2875 Unicode Length Modifiers
(Superseded by N2983)
N2777 C Identifier Syntax using Unicode Standard Annex 31
(Superseded by N2836)
N2730 Restartable and Non-Restartable Functions for Efficient Character Conversions | r5
(Superseded by N2902)
N2620 Restartable and Non-Restartable Functions for Efficient Character Conversions | r4
(Superseded by N2730)
N2595 Restartable and Non-Restartable Functions for Efficient Character Conversions | r4
(Superseded by N2500)
N2500 Restartable and Non-Restartable Functions for Efficient Character Conversions | r2
(Superseded by N2595)
N2440 Restartable and Non-Restartable Functions for Efficient Character Conversions | r1
(Superseded by N2500)
N2431 Restartable and Non-Restartable Functions for Efficient Character Conversions
(Superseded by N2440)
N2231 char8_t: A type for UTF-8 characters and strings
(Superseded by N2653)
N2198 Adding the u8 character prefix
(Superseded by N2418)