feat: indent on enter after certain EOL tokens #329

thorimur · 2023-09-21T02:36:26Z

This PR causes indentation when hitting enter after certain tokens that appear at the end of a line.
Specifically, after:

by, do, try, finally, then, else, where, extends, deriving, :=, =>, and <|

It also inserts two indentations in the case that we hit enter immediately after : on a line that doesn't start with a binder bracket. We exclude binder-bracket-initial lines so as to avoid inserting improper indentation after already-indented hypotheses:

theorem foo (...)
    (...) :| <-- hit enter here
    | <-- cursor ends up here

If we encounter one of the "postindented EOL tokens" or a trailing : in the first line of a focus block, we add extra indentation as appropriate.

depends on: feat: indent on enter after starting a focus block #328

Ideally, we'd be able to describe indentation using Lean's parser, as an autoformatter implemented by the extension would.

Currently, this adds extra indents (after the first) by appending \t. I'm not sure if VS code can be asked to indent twice "natively" in this situation, and I'm not sure if using \t would be problematic when it comes to distinctions between space-based indents and tab-based indents. It seems that VS code converts these to spaces as appropriate, and cannot be asked to indent multiple times any other way.

This PR is WIP; please feel free to comment with any opinions or suggestions, especially if I've missed any common tokens! :)

mhuisi · 2023-09-26T07:45:00Z

vscode-lean4/language-configuration.json

+        },
+        {
+            // Indent twice after EOL `:`
+            "beforeText": "^.*\\s:\\s*$",


I'm not sure that this is a good rule to have in general, as there are many common examples where this is the wrong indentation that are similar to the following:

theorem foo (...) (...) : ...

How about restricting this rule to lines with certain identifiers at the start for now? (E.g. theorem, def, have, obtain, opaque, ...)? Also note that there may be attributes before the identifier, like @[simp].

A good way to test your regexes is to search mathlib4, std4 and lean4 (which uses slightly different formatting) using the VS Code search on the left side.

Hmm—what about only restricting it to lines which don't start with whitespace followed by a parenthesis (or in general, binder bracket)?

This doesn't account for type signatures with really long hypothesis type signatures, but those usually require careful manual formatting anyway. The main advantage here is that it reduces the complexity of the regex (there are a lot of different initial tokens to account for!) and is extensible to new tactics with have-like syntax and new def-like commands (like irreducible_def).

I've tentatively taken this approach in cea0f89 since it's easy to implement, but I'm happy to change it if we do want to go the initial-token route. :)

vscode-lean4/language-configuration.json

mhuisi · 2023-09-26T08:17:52Z

vscode-lean4/language-configuration.json

+        */
+        {
+            // Indent focus blocks followed by a postindented EOL token twice
+            "beforeText": "^\\s*(·|\\.)\\s(.*\\s)?(by|do|try|finally|then|else|where|\\:=|=>|<\\|)\\s*$",


The escape in \\:= should not be necessary.

extends and deriving may also be good EOL tokens.

Ah, ok—by the way, do you know if there's a reference somewhere for what regex flavor is used by VS code? I see some hints online that it's oniguruma, but I'm not sure. (In this case I can just test it and be assured it works, but I'm curious about the general case. :) )

Done in 7f89224 and 4828a03 respectively. :)

* remove corresponding comment from json

…dent-after-tokens

* only applies to English-language EOLs * doesn't account for all identifier chars yet

mhuisi · 2023-10-12T07:44:55Z

(I haven't forgotten this PR and I'll get back to re-reviewing it as soon as I find time for it)

mhuisi · 2024-03-13T13:51:43Z

After mulling on this PR for a while, I think that there are unfortunately too many edge cases where the post-indented EOL token heuristics do the wrong thing. I'm afraid that accidentally adding incorrect indentation on occasion is much more annoying than consistently not adding any additional indentation, so I would prefer to not add these rules.

I've thought a bit about whether there is a better way to add these rules than the one in this PR, but I can't think of any with the limited view of the text that VS Code gives us with this configuration option.

thorimur · 2024-03-14T18:41:36Z

Makes sense! (I wish VS code would let us access the result of parsing somehow...)

Just to be clear, do you consider adding some indentation but not enough indentation to be part of the issue, or is the problem only the potential for "overshooting"? If the latter, I think there are some tokens which are always followed by at least one postindentation (such as try), for which we could develop rules that never overshoot. (We could also restrict to different circumstances to avoid overshooting, such as "by when the line starts with no whitespace and a def-like token".) (We'd want to verify any such universal style claims with regexes on mathlib before committing to them, of course.)

But, I undertand if you feel that adding some-but-not-necessarily-enough indentation causes more cognitive overhead than simply not doing so at all. :)

mhuisi · 2024-03-15T09:32:39Z

Makes sense! (I wish VS code would let us access the result of parsing somehow...)

There's the textDocument/onTypeFormatting request, but the central problem is that you need to elaborate all dependencies of a declaration in order to be able to parse it, and so the latency for this request can be unacceptably high (think trying to use auto-completion in a portion of the document where the orange bars haven't disappeared yet). Hence, this approach unfortunately isn't really feasible for us, either.

Just to be clear, do you consider adding some indentation but not enough indentation to be part of the issue, or is the problem only the potential for "overshooting"?

I consider inconsistent behavior in general to be bad and especially bad w.r.t. something that you usually don't need to think about, like input. In this context, I'd say that inconsistently adding too much indentation is worse than inconsistently adding too little indentation, because having to actively revert something that the computer did incorrectly is always extra frustrating (compared to needing to help it along).

Specifically, I don't see a way that we can handle multi-line function signatures correctly in a way that doesn't overshoot or undershoot, and I think that this specific inconsistency will be sufficiently frustrating in practice (in either direction) that it's probably better to err on the side of simple indentation inheritance.

mhuisi · 2024-06-20T11:31:08Z

Closing this for now due to reasons mentioned above.

thorimur added 4 commits September 20, 2023 20:54

feat: indent on enter after starting a focus block

89370b4

feat: postindented EOL tokens

54a2b37

feat: indent twice after by etc. following ·

088a9e3

feat: indent twice after EOL :

db7499c

thorimur mentioned this pull request Sep 21, 2023

feat: indent on enter after starting a focus block #328

Merged

feat: then, else

7ccfedf

mhuisi reviewed Sep 26, 2023

View reviewed changes

thorimur added 15 commits September 30, 2023 15:35

docs: create language-configuration.md

de810d7

* remove corresponding comment from json

Merge remote-tracking branch 'origin/indent-focus-block-rule' into in…

3d52bd0

…dent-after-tokens

docs: move comments to documentation file

24c587e

feat: alter type signature regex; update docs

cea0f89

feat: add extends/deriving as EOL tokens

4828a03

fix: unescape : in EOL regexes

7f89224

fix: allow nonidentifier characters before EOLs

cac82a7

* only applies to English-language EOLs * doesn't account for all identifier chars yet

docs: add caveat re: indentation

5a5f8d7

feat: add termination_by('), decreasing_by

0c25f0e

docs: note about EOLs after focus blocks

a5b6535

fix: use word boundary before English-lang EOLs

ca02ad0

feat: include ← as EOL token

d1605cc

feat: from, :: EOL tokens

245ae1b

feat: basic calc support

50ad56b

feat: account for single-line comments

951f807

mhuisi closed this Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: indent on enter after certain EOL tokens #329

feat: indent on enter after certain EOL tokens #329

thorimur commented Sep 21, 2023 •

edited

Loading

mhuisi Sep 26, 2023

thorimur Sep 30, 2023 •

edited

Loading

mhuisi Sep 26, 2023

thorimur Sep 30, 2023 •

edited

Loading

thorimur Sep 30, 2023

mhuisi commented Oct 12, 2023

mhuisi commented Mar 13, 2024

thorimur commented Mar 14, 2024

mhuisi commented Mar 15, 2024

mhuisi commented Jun 20, 2024

feat: indent on enter after certain EOL tokens #329

feat: indent on enter after certain EOL tokens #329

Conversation

thorimur commented Sep 21, 2023 • edited Loading

mhuisi Sep 26, 2023

Choose a reason for hiding this comment

thorimur Sep 30, 2023 • edited Loading

Choose a reason for hiding this comment

mhuisi Sep 26, 2023

Choose a reason for hiding this comment

thorimur Sep 30, 2023 • edited Loading

Choose a reason for hiding this comment

thorimur Sep 30, 2023

Choose a reason for hiding this comment

mhuisi commented Oct 12, 2023

mhuisi commented Mar 13, 2024

thorimur commented Mar 14, 2024

mhuisi commented Mar 15, 2024

mhuisi commented Jun 20, 2024

thorimur commented Sep 21, 2023 •

edited

Loading

thorimur Sep 30, 2023 •

edited

Loading

thorimur Sep 30, 2023 •

edited

Loading