Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: indent on enter after certain EOL tokens #329

Closed
wants to merge 20 commits into from

Conversation

thorimur
Copy link
Contributor

@thorimur thorimur commented Sep 21, 2023

This PR causes indentation when hitting enter after certain tokens that appear at the end of a line.
Specifically, after:

  • by, do, try, finally, then, else, where, extends, deriving, :=, =>, and <|

It also inserts two indentations in the case that we hit enter immediately after : on a line that doesn't start with a binder bracket. We exclude binder-bracket-initial lines so as to avoid inserting improper indentation after already-indented hypotheses:

theorem foo (...)
    (...) :| <-- hit enter here
    | <-- cursor ends up here

If we encounter one of the "postindented EOL tokens" or a trailing : in the first line of a focus block, we add extra indentation as appropriate.


Ideally, we'd be able to describe indentation using Lean's parser, as an autoformatter implemented by the extension would.

Currently, this adds extra indents (after the first) by appending \t. I'm not sure if VS code can be asked to indent twice "natively" in this situation, and I'm not sure if using \t would be problematic when it comes to distinctions between space-based indents and tab-based indents. It seems that VS code converts these to spaces as appropriate, and cannot be asked to indent multiple times any other way.

This PR is WIP; please feel free to comment with any opinions or suggestions, especially if I've missed any common tokens! :)

},
{
// Indent twice after EOL `:`
"beforeText": "^.*\\s:\\s*$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is a good rule to have in general, as there are many common examples where this is the wrong indentation that are similar to the following:

theorem foo (...)
    (...) : 
    ...

How about restricting this rule to lines with certain identifiers at the start for now? (E.g. theorem, def, have, obtain, opaque, ...)? Also note that there may be attributes before the identifier, like @[simp].

A good way to test your regexes is to search mathlib4, std4 and lean4 (which uses slightly different formatting) using the VS Code search on the left side.

Copy link
Contributor Author

@thorimur thorimur Sep 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm—what about only restricting it to lines which don't start with whitespace followed by a parenthesis (or in general, binder bracket)?

This doesn't account for type signatures with really long hypothesis type signatures, but those usually require careful manual formatting anyway. The main advantage here is that it reduces the complexity of the regex (there are a lot of different initial tokens to account for!) and is extensible to new tactics with have-like syntax and new def-like commands (like irreducible_def).

I've tentatively taken this approach in cea0f89 since it's easy to implement, but I'm happy to change it if we do want to go the initial-token route. :)

vscode-lean4/language-configuration.json Show resolved Hide resolved
*/
{
// Indent focus blocks followed by a postindented EOL token twice
"beforeText": "^\\s*(·|\\.)\\s(.*\\s)?(by|do|try|finally|then|else|where|\\:=|=>|<\\|)\\s*$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escape in \\:= should not be necessary.

extends and deriving may also be good EOL tokens.

Copy link
Contributor Author

@thorimur thorimur Sep 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok—by the way, do you know if there's a reference somewhere for what regex flavor is used by VS code? I see some hints online that it's oniguruma, but I'm not sure. (In this case I can just test it and be assured it works, but I'm curious about the general case. :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7f89224 and 4828a03 respectively. :)

@mhuisi
Copy link
Collaborator

mhuisi commented Oct 12, 2023

(I haven't forgotten this PR and I'll get back to re-reviewing it as soon as I find time for it)

@mhuisi
Copy link
Collaborator

mhuisi commented Mar 13, 2024

After mulling on this PR for a while, I think that there are unfortunately too many edge cases where the post-indented EOL token heuristics do the wrong thing. I'm afraid that accidentally adding incorrect indentation on occasion is much more annoying than consistently not adding any additional indentation, so I would prefer to not add these rules.

I've thought a bit about whether there is a better way to add these rules than the one in this PR, but I can't think of any with the limited view of the text that VS Code gives us with this configuration option.

@thorimur
Copy link
Contributor Author

Makes sense! (I wish VS code would let us access the result of parsing somehow...)

Just to be clear, do you consider adding some indentation but not enough indentation to be part of the issue, or is the problem only the potential for "overshooting"? If the latter, I think there are some tokens which are always followed by at least one postindentation (such as try), for which we could develop rules that never overshoot. (We could also restrict to different circumstances to avoid overshooting, such as "by when the line starts with no whitespace and a def-like token".) (We'd want to verify any such universal style claims with regexes on mathlib before committing to them, of course.)

But, I undertand if you feel that adding some-but-not-necessarily-enough indentation causes more cognitive overhead than simply not doing so at all. :)

@mhuisi
Copy link
Collaborator

mhuisi commented Mar 15, 2024

Makes sense! (I wish VS code would let us access the result of parsing somehow...)

There's the textDocument/onTypeFormatting request, but the central problem is that you need to elaborate all dependencies of a declaration in order to be able to parse it, and so the latency for this request can be unacceptably high (think trying to use auto-completion in a portion of the document where the orange bars haven't disappeared yet). Hence, this approach unfortunately isn't really feasible for us, either.

Just to be clear, do you consider adding some indentation but not enough indentation to be part of the issue, or is the problem only the potential for "overshooting"?

I consider inconsistent behavior in general to be bad and especially bad w.r.t. something that you usually don't need to think about, like input. In this context, I'd say that inconsistently adding too much indentation is worse than inconsistently adding too little indentation, because having to actively revert something that the computer did incorrectly is always extra frustrating (compared to needing to help it along).

Specifically, I don't see a way that we can handle multi-line function signatures correctly in a way that doesn't overshoot or undershoot, and I think that this specific inconsistency will be sufficiently frustrating in practice (in either direction) that it's probably better to err on the side of simple indentation inheritance.

@mhuisi
Copy link
Collaborator

mhuisi commented Jun 20, 2024

Closing this for now due to reasons mentioned above.

@mhuisi mhuisi closed this Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants