-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Permit more control characters in comments (#924)" #996
base: main
Are you sure you want to change the base?
Conversation
This reverts commit ab74958. I'm a simple guy. Someone reports a problem, I drink coffee and fix it. No one reports a problem? There is nothing to fix and I go drink beer. No one really reported this as a problem, but it *does* introduce needless churn for all TOML implementations and the test suite. Do we need to forbid *anything* in comments? Probably not, and in strings we probably only need to forbid \x00. But at least before it was consistent with strings, and more importantly, what everyone wrote code for, which is tested, and already works. [None of the hypotheticals](toml-lang#567 (comment)) on why this is "needed" are practical issues people reported, and most aren't even fixed: a comment can still invalidate the file, you must still parse each character in a comment as some are still forbidden, the performance benefits are very close to zero they might as well be zero, and you still can't "dump whatever you like" in comments. So it doesn't *actually* change anything, it just changes "disallow this set of control characters" to ... "disallow this set of control characters" (but for a different set). That's not really a substantial or meaningful change. The only (minor) real-world issue that was reported (from the person doing the Java implementation) was that "it's substantially more complicated to parse out control characters in comments and raise an error, and this kind of strictness provides no real advantage to users". And that's not addressed at all with this, so... --- And while I'm at it, let me have a complaint about how this was merged: 1. Two people, both of whom actually maintain implementations, say they don't like this change. 2. This is basically ignored. 3. Three people continue written a fairly large number of large comments, so anyone who wasn't already interested in this change unsubscribes and/or goes 🤷 4. "Consensus". Sometimes I feel TOML attracts people who like to argue things from a mile-high ivory tower with abstract arguments that have only passing familiarity with any actual pragmatic reality. Fixes toml-lang#995
I'd consider @abelbraaksma comment that triggered the whole change indeed a bug report, even though it's not a bug he had run into personally. He got 4 thumps-ups, and in the whole discussion that followed in that issue not a single person argued that the restrictive 1.0 status (where comments must fulfill most string requirements, despite not really being strings) should be kept. Plus, though you say that implementation maintainers didn't like this, the very next comment says:
As for the later discussion in #924, I suppose you refer to yourself (#924 (comment)) and @hukkin (#924 (comment)) expressing doubts? While the latter talked about "potential busywork for implementation maintainers", their main concern seem to have been about "about allowing lone CR in comments" – and this was subsequently addressed by keeping them indeed forbidden. Plus another maintainer, @marzer, explicitly welcomed the change (#924 (comment): "status quo bad"). And you yourself write above:
Arguably, the current main, which you wish to revert, comes closer to that ideal ("forbid as little as possible in comments"), while the old 1.0 standard is to forbid a lot, without anyone seeming to know exactly why. Also I'm a bit concerned by your suggestion that TOML maintainers are more important than other people here, that is, TOML users. Now admittedly, maintainers are essential, since without them there would be nothing to use. But users are important too, since without them there would be no point in maintaining anything. So I'd suggest not to play them out against each other. Plus there is the workload of future maintainers, of all the TOML implementations that don't exist yet but will hopefully be written some day. I'd say that while the difference is not large, for them life has become a bit simpler thanks to the accepted change (#924). In 1.0, comments are essentially a fifth string type – not exactly matching any of the four actual string types, but still having similar requirements when it comes to error checking. In the future, error checking in comments is considerably simplified, since you only have to check for CR (if line splitting has already happened) or you have to check for CR or LF to find the end of the comment (if not – with the former, if not followed by LF, signaling an error). That's arguable more simple than having to scan for arbitrary control characters and confusing the user with strange error messages ("illegal comment"). |
A small correction: Rechecking the ABNF, I see that VT (vertical tab) and FF (form feed) are still forbidden too, apparently due to #924 (comment). That wasn't much discussed at the time. If you (or anybody else here) makes the case that they too should be allowed in comments, to make the code simpler and more consistent, I think I'd be on board with that – if they occur at all, they're unlikely to cause problems, so there doesn't seem to be a particular reason to keep them forbidden. |
That comment was never implemented, so those thumbs-up are rather meaningless. cleishm will still have the same problems as well so that's meaningless as well.
My entire point is that for users this change makes zero practical difference.
For future maintainers it's more work, or at the very least the same amount of work. You still need to put in the plumbing which makes this hard (hard-ish anyway) in some cases. Your "small correction" is not "small". It's actually easier with 1.0 because now you can just reject these control characters outright in your tokenizer as they're never valid anywhere (since strings and comments have the same set). This is also what makes this more work than "just" an updated range. This is the entire problem, really. You all went off to have your own abstract discussions without even consulting the people who raised implementation concerns and came up with a "solution" which didn't even address the actual concerns that were raised, and instead you came up with some middle of the ground thing that has more or less the downsides of everything and the upsides of nothing. This is also why:
That's because they are, kind of anyway. Free-range no-consequence discussions are easy if you're off-loading the work to others. If you think it's so important, then why don't you go and fix things? If you had actually put the work then I could have had some respect for it. Now, I'm willing to listen to anyone, but this just doesn't really change anything for anyone in any meaningful way and the only real-world consequence is churn for implementers, so, you know, if you really want to change it then put in the work? That's really how it works pretty much for any volunteer effort. |
You're right that the current spec is a kinda strange middle ground that doesn't make all that much sense, and it's different from what I had remembered. But I think the right way is not backward, but forward by allowing VT and FF in as well. NUL and the line terminator chars CR and LF should remain forbidden, it stands to reason, but nothing else. I'll see if I can create a PR one of these days. |
A careful reader has found out that 1.0 is inconsistent in its rules regarding comments – specifically, the ABNF allows U+007F (DEL), while the written spec forbids it. So "revert to the old status quo" is simply not an option, since the old status quo is buggy and needs to be fixed. |
That was already spotted and fixed (as mentioned in #995's closing comment). |
Spotted and fixed, yes, but only if we don't revert. So let's not – the old status quo, since buggy, is simply no longer an option. |
That's a simple typographical fix, unrelated to the issues raised here. |
@arp242 Since you have started rejecting user contributions in toml-test that conform to the current unreleased TOML spec but disagree with what your particular take is, it's time for this issue to be settled. @pradyunsg In light of the conversation here over the past eight months, and in related feedback on the changes since #924, what is your take on this PR? |
We can't add tests for it because the TOML library needs updating, and that's not trivial (not super-hard, but not trivial). I don't see the point of spending time on it because this improves or fixes nothing for any users, as explained before. I'm not doing that, regardless of what ends up in TOML 1.1. I got tons of actually useful stuff to do. If you have deep philosophical problems with that then find someone else to maintain toml-test. I'm serious. To hell with "but akshually, I have better feels with this" changes that don't objectively improve anything and that don't address practical problems from random people who haven't put in any effort in anything. And no, I'm not going to engage in lengthy discussions endlessly responding to 6 paragraphs of woolly text. I have actual code to write. I spent tons of time on toml-test and associated things like the text-matrix, including on some things I don't really agree with, but there are limits. |
For reference, I'm the maintainer of the primary Rust TOML library and a member of the Cargo team though I'm not speaking on behalf of them. While I agree that we should revert the more controversial parts of 1.1 to get it out and re-examine how decisions are made (maybe my Rust bias but I feel like a more formalized RFC process should be adopted due to the ramifications of changes)., I don't think the spec and the conformance test suite should be limited by the decisions of an individual implementation maintainer. I know for Cargo, we've "bet big" on TOML but if it came to it, we could fork if needed. Thats different (and I watch this repo in the hopes that I can help avoid that). If an implementation is needed to match the spec, I could work to add unstable support for 1.1 to Rust's TOML parser. It might be a few weeks out as I am on vacation and have some other obligations to wrap up. |
This reverts commit ab74958.
I'm a simple guy. Someone reports a problem, I drink coffee and fix it. No one reports a problem? There is nothing to fix and I go drink beer.
No one really reported this as a problem, but it does introduce needless churn for all TOML implementations and the test suite. Do we need to forbid anything in comments? Probably not, and in strings we probably only need to forbid \x00. But at least before it was consistent with strings, and more importantly, what everyone wrote code for, which is tested, and already works.
None of the hypotheticals on why this is "needed" are practical issues people reported, and most aren't even fixed: a comment can still invalidate the file, you must still parse each character in a comment as some are still forbidden, the performance benefits are very close to zero they might as well be zero, and you still can't "dump whatever you like" in comments.
So it doesn't actually change anything, it just changes "disallow this set of control characters" to ... "disallow this set of control characters" (but for a different set). That's not really a substantial or meaningful change. The only (minor) real-world issue that was reported (from the person doing the Java implementation) was that "it's substantially more complicated to parse out control characters in comments and raise an error, and this kind of strictness provides no real advantage to users". And that's not addressed at all with this, so...
And while I'm at it, let me have a complaint about how this was merged:
Sometimes I feel TOML attracts people who like to argue things from a mile-high ivory tower with abstract arguments that have only passing familiarity with any actual pragmatic reality.
Fixes #995