-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended bare key ranges to include all emojis #1002
base: main
Are you sure you want to change the base?
Conversation
Also explain better what's allowed in bare keys and remove the emoji example, since (though it's possible) we don't advice using emojis as bare keys).
This is an alternative fix for #954. It extends the bare key range to allow arbitrary emojis and generally many harmless symbols that were so far excluded. At the same time it continues to allow arbitrary words in arbitrary languages in bare keys, which I consider very important for reasons of fairness and proper internationalization. And the pragmatic approach of simply defining ranges is preserved, instead of switching to an approach based on Unicode character classes that would make implementation considerably more complicated for those who don't have access to a fully-fledged Unicode support library. The language of the written spec is also adapted to better describe what's now allowed in bare keys, and the emoji example has been deleted. (Using more or less arbitrary emojis as bare keys is now possible, but it's not something we recommend, as words make more meaningful keys.) The extended ranges still try to exclude, in so far as reasonably possible, characters that are "problematic" in bare keys since they look similar to TOML's own meaningful punctuation (especially quotation marks, hashes, equals signs, commas, parentheses). This is just a "reasonable best effort" and not meant to be totally comprehensive – truly ensuring that visual inspection of a file gives identical results to what a parser sees is in any case impossible with Unicode (even in the total absence of bare keys). |
Here's a list of all the new characters added by this PR: https://gist.github.com/ChristianSi/f3d97247c79d234326c47779227b1ff0. Note that I didn't include the first few emojis (#️ – hash sign, *️ – asterisk, ©️ – copyright, ®️ – registered, Let me know what you think! |
@pradyunsg Can you take a look, please? Time to get 1.1 closer a release candidate! |
It's not really about "emoji" but about consistent character ranges; i.e. "this type is allowed, and this type isn't". Here's just the first thing I looked it. It currently skips:
But there are many parenthesis; I marked the allowed ones with here:
So basically all except these that are specifically excluded. "You can have super- and subscript parens, small parens, and fullwidth parens, but not regular parens or ornamental parens" is unexplainable, just as "you can use this smiling emoji but not that other smiling emoji" is. |
@arp242: It's actually quite simple to explain: "You can use arbitrary words in arbitrary languages as bare keys. If you want to use more than one word as a bare key, use dashes or underscores to connect them, as whitespace is not allowed. Better don't use other characters in bare keys, as that may or may not work." That's in fact more or less how I explain it now in the README. These additional non-letters are simply allowed in order to simplify the range definitions, it's not that anybody is supposed to use them. So there is no point in, and no need for, detailed explanations. |
I can "explain" everything with this. People don't read specs, nor should they. They try stuff and see what happens. What you get now is that they try something, which works, and then they change it to something else and that doesn't work, which makes no sense. |
Why should people try using something like LEFT PARENTHESIS UPPER HOOK in a key and then, when it works, move on to MEDIUM LEFT PARENTHESIS ORNAMENT, only to be disappointed that it doesn't work? That doesn't strike me as a very likely scenario. I think you're worrying too much. |
@pradyunsg Are you around to take a look? |
@pradyunsg Kind reminder that this is still open, and that maybe you need additional maintainers to support you? |
@pradyunsg Ping? |
I think this is the key improvement here. Allowing only certain character classes was never gonna work (issues with different versions of Unicode, or, if specified, way too specific and unwieldy to be maintainable long-term in any practical way). My opinions have already been discussed in the linked issue. The main point was this one: blowing up the spec for the sake of one single character class specification, and that is addressed here. While this PR includes many characters that arguably shouldn't be part of a bare key, ultimately, that's up to users (and there are many languages, including certain .NET languages, that support all characters in type and member names, so there's precedent): if they want to write code that includes emojis or brackets, they can. The more pragmatic and welcome change here is to simply be inclusive of other scripts and languages (which was the aim of the original PR that extended bare keys). Anything else is just what fishermen call "by-catch" ;). Great work for getting this ready! |
are generally accepted, while not all symbols and punctuation marks are. If you | ||
want to use a bare key made up of several words, use a suitable separator | ||
character (such as an underscore or hyphen) between the words, as spaces are not | ||
allowed. Note that bare keys are allowed to be composed of only digits, e.g. | ||
1234, but are always interpreted as strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. We tried to come up with some advisary language in the earlier attempt to extend bare keys, but decided against it. I think it is good to have it in 👍.
Also explain better what's allowed in bare keys and remove the emoji example, since (though it's possible) we don't advice using emojis as bare keys.
Fixes #954.