Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error recovery #2038

Draft
wants to merge 197 commits into
base: main
Choose a base branch
from
Draft

Error recovery #2038

wants to merge 197 commits into from

Conversation

PieterOlivier
Copy link
Contributor

@PieterOlivier PieterOlivier commented Oct 4, 2024

This is a feature-tracking PR, will remain draft while we are working on the error recovery feature. Other smaller PRs will target this branch, and when ready we can merge this one. Where possible we try to do most of the reviewing in the open PRs that target the error-recovery branch, but sometimes stuff is missed there, and then it's better to comment on changes in this global-tracking-PR.

History:

jurgenvinju and others added 30 commits March 30, 2022 16:45
Sometimes recovery nodes start before the current location where
the parser failed to continue. Since the parser works with a
short queue of schedulede TODO's around the current cursor, we
might end up outside of this queue when recovering. This breaks
several unspecified invariants of the SGTBF implementations. For
now I added a detection that a recovery node is to be planned
before the currently retained history and filter that recovery
node. The next step will be to make sure backtracking over the
current location is made possible.
… because the next parser loop iteration always wants to advance one character
…e version of Rascal and (b) the edit command is wired to the edit IDEService
…t in the scheme by copyinhthe contents to a tmp file
PieterOlivier and others added 23 commits October 23, 2024 14:15
MultiErrorBug condenses the Rascal syntax to only cover the problematic
issue. The issue itself is of yet unsolved.
…cks-left

Trigger recovery when only recovery stacks are left
Fixed size inconsistency between allocating and enlarging
Added '\n' end matcher so end-of-line is also seen as skip terminator
if (!allowAmbiguity && allowRecovery && filters.isEmpty()) {
// Filter error-induced ambiguities
RascalValueFactory valueFactory = (RascalValueFactory) ValueFactoryFactory.getValueFactory();
parseForest = (ITree) new ErrorRecovery(valueFactory).disambiguateErrors(parseForest, valueFactory.bool(false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this disambiguation step is smart in most cases, but it should be optional and on-by-default. When we do repair or autocomplete of parse errors, then some of the filtered ambiguities could have been insightful for the user. A comparable algorithm would rank the alternatives, for example by the amount of skipped tokens; there are many different options which should be left to the language designer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we skip this step here and leave it to the language designer in their parse function, or their services?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set allowAmbiguity=true a language designer already gets the opportunity to filter the trees themself. So it's already optional and on-by-default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the allowAmbiguity flag is meant to indicate something different. This option now conflates error recovery for a user with developer ambiguity in a grammar.

It's important for usability (performance) to still be able to exit tree construction quickly on unexpected ambiguity, and have predictable error recovery that represents all currently valid prefixes.

I don't know what the right solution is yet; but I think that allowAmbiguity=false should be ignored after error recovery

Copy link
Member

@jurgenvinju jurgenvinju Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this less subtle-sounding:

  1. A normal ambiguity is almost never intended, confuses the downstream tools as well as the user, and possibly leads to high polinomial tree constructing time (very slow). Typically we disallow ambiguity after grammar deployment because of this, while we need ambiguity to debug grammars at development time.
  2. An error ambiguity is almost always inherent to the recovery process. The clusters represent the different viable prefixes at the moment the parser discovered it was stuck. The use needs all of them for useful features like auto-complete and quickfix. Disallowing those after deployment will hamper the usability of error recovery severely. And this is also flipped: Typically we don't need error recovery at grammar development time because we need to fix them ourselves in the grammar, while it is essential to turn on after deployment for usability.

In other words we do expect error ambiguïty but we don't expect grammatical ambiguity. So it can't be the same option.

It's good to know, as a kind of illustration, that error recovery for a single typo will produce ambiguous clusters of predictions on different levels, for different nonterminals in the tree. At the same time also on the same line, and same nonterminals, different prefixes could be active. As a result one recovered file {w,c,sh}ould receive quickfix proposals on different line/column positions. But if we already filtered heuristically because "ambiguity is not allowed", many of the valid and natural options will have been removed.

Copy link
Member

@jurgenvinju jurgenvinju Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could pass in an external error filtering function similar to the actions, if that is required for speed, but otherwise I think it would be best to not filter at this stage and leave the ranking to quickfix and autocomplete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are compelling arguments. So we should remove the automatic error tree disambiguation altogether and leave it to the developer. A developer can always specify a filter to do the disambiguation.

Currently the disambiguation function takes two arguments (the tree and a boolean specifying if "normal" ambiguities are allowed). Maybe we should provide two separate functions that only take a tree argument so the programmar can just use one of them directly as a parse filter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can have examples of reusable functions and filters in Recover.rsc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the ambiguity parameter interaction was too simplistic. But I want to add a few points for considiration.

  1. There will be regular users and power users of this functionality. I think we should recognize that, and make sure it's also usable for people do do not want to wack a whole forest just to get syntax highlighting and maybe an outline. I think having a keyword parameter that says something like: keepOnlyShortestErrorTree or a smaller version, would help make this more accessible.
  2. I think that since with the exception of features like auto-complete, the use of the whole forest is quite niche. There are quite some places where we'll run into error trees where we don't care for which of the trees in the forest is the right one, we just want to skip over the part and do something usefull for the rest.
  3. I want to be able to run the parser outside the evaluator, also if there is an error tree filter. This is usefull for rascal-lsp both DSLs (where everything is inside the evaluator) and for rascal itself. SInce error recovery will run even more frequent, and might sometimes spend a bit more time to recover, I would love if it we can keep the currrent performance feature of running it on a regular java thread without having to lock the evaluator. Passing it a post parse filter function will (I think) now require a lock on the evaluator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion!

  1. It's exactly the normal users who have to be protected against heuristic filters. Their semantics is unpredictable while if we built it in it becomes a contract for their downstream tooling. The shortest error is truly nothing more than a blunt heuristic which misfires more than it doesn't. Especially if we want to provide grammar-derived repair for those simpler users the shortest error makes little sense, but I'm pretty sure it almost never is the "right" one.

We've been here before in the 2000's with "powerful" disambiguation filters. Removing them because of their semantic inaccuracy was a social-economic wasteland. People were not happy their trees "suddenly" became ambiguous, while they were fully ignorant of what they couldn't be aware of: the trees they actually should care about. Hiding ambiguity is a mine field that I want to stay clear of.

  1. That's not true. The forest represents all the viable prefixes at the time of getting stuck. None of them are more likely to be the intended future representation, after repair, in general. It's worse even. Many of the predictions may have already failed and garbage collected, making the forest also incomplete in that sense. More incompleteness will make this only worse. Currently we have a declarative and complete contract for the output of error recovery; with any auto-filter the contract becomes ill-defined from a language semantics perspective.

  2. Error recovery and repair or diagnostics algorithmes do not belong in the parser contribution, ever. The requirements on the downstream processor dictate how to deal with errors or multiple ambiguous errors. A repair Algorithms takes each prefix to complete, a type checker may want to skip entire expressions and statements that contain errors at all, a diagnostics algorithm may fuzz around the error positions and reparse, etc. Different filters for different backends. Filtering errors is not a syntax aspect, it's a semantics aspect.

In one LSP server, different services will deal differently with the errors and their clusters. The first priority should be to make sure that algorithms written by beginners fail either gracefully or stay accidentally robust in the presence of errors. Features like visit and dynamic dispatch already deal gracefully with ambiguity and errors (they simply don't match). To make Algorithms robust without thinking at ask all the users must do is add a default case that does nothing.

The dirty edges w still have the brush up are Rascal features that look into trees without pattern matching. See the pr on Field projection. that solution still has edge cases that I'd like to see buttoned up.

Comment on lines +20 to +21
@synopsis{Check if a parse tree contains any error nodes, the result of error recovery.}
bool hasErrors(Tree tree) = /appl(error(_, _, _), _) := tree;
Copy link
Contributor

@sungshik sungshik Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language server for Pico uses this function like this:

// definitions of variables
rel[str, loc] defs = {<"<var.id>", var.src> | /IdType var := input, !hasErrors(var)};

Without context, it could be unclear that it's about parse errors. (This code in the Pico language server occurs in the definition of the analyzer, so it could also be about type errors.)

So, minor suggestion: Because it's user-facing, maybe the function could be renamed to hasParseErrors. (And by the same token, maybe the module could be renamed to ParseErrorRecovery.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking for a solution where this function is not necessary. Since parse trees are a built-in feature of Rascal, there is opportunity for built-in features for error trees.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a branch off of this one, with an experimental semantics for field projection on error trees. Pattern matching already fails naturally on them but field projection fails too often on error trees, making existing code less robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants