Errors and Ors #30

natemartinsf · 2021-11-18T08:54:54Z

natemartinsf
Nov 18, 2021

Let me know if I'm misunderstanding something major, or need to review the examples better.

My understanding of "or" parsers is that they stop errors from propagating. For example, if you have a chain of parsers like parser1.or(parser2).or(parser3), and you create an error in parser1, it will not be collected with your other parser errors. This makes sense, because an error with parser1 is an indication that you should try parser2.

The problem this causes is if all of your parsers are combined in a large "or" chain (which I think is typical in any non-trivial parser) then very few parser errors actually create an error. So if parser1, 2, and 3 all fail, then you don't get any errors at all. (it just doesn't parse)

If I'm understanding this correctly then I have a proposal: Propagate the error for whichever parser in the chain consumed the most tokens before finding an error. So if Parser1 consumes 3 tokens before an error, Parser2 1 token, and Parser3 0 tokens, then it would collect the error from the first parser.

zesterer · 2021-11-18T14:22:48Z

zesterer
Nov 18, 2021
Maintainer

The purpose of or is to allow the parsing of more than one pattern in the same input location.

The precise way errors emitted by each of the parsers is handled is left unspecified (because there is room for improvement in the future), but the important details are:

If at least one of the parsers parses successfully with no errors, no errors will be generated by the or chain.
If none of the parsers parse successfully with no errors, at least one error will be generated by the or chain.

So if parser1, 2, and 3 all fail, then you don't get any errors at all. (it just doesn't parse)

When multiple parsers in the chain fail, Chumsky has a series of internal heuristics for determining which errors should be emitted. Generally speaking, we try to emit the error(s) that would be 'most useful' to the user.

Propagate the error for whichever parser in the chain consumed the most tokens before finding an error.

Chumsky already does this (along with a few other things).

0 replies

natemartinsf · 2021-11-18T23:23:26Z

natemartinsf
Nov 18, 2021
Author

Oh interesting! I'll need to check then to see why my parsers aren't creating any errors when they fail to parse. Will a "Just" parser create an error if it doesn't get the right token, or do you need to use map_err or filter_map to make an error in all cases?

0 replies

zesterer · 2021-11-18T23:33:56Z

zesterer
Nov 18, 2021
Maintainer

just will always produce an error if the exact input is not found.

If your parser is not producing an output (i.e: parse_recovery produces a None) but is also not producing any errors, then that's a bug in Chumsky and I'd be very interested to see your setup!

0 replies

natemartinsf · 2021-11-19T05:18:10Z

natemartinsf
Nov 19, 2021
Author

When my parsers are failing on invalid input right now, they are producing
Some( [], ) Errors:[]

So it's not None but it is an empty array, and any of the tokens that aren't getting parsed aren't producing any errors.

I'm very sure I'm doing something wrong; I'm both new to Rust and new to creating programming languages. I'd be happy to show you what I'm trying, it's in a private repo right now but I'll add you to it.

1 reply

zesterer Nov 19, 2021
Maintainer

That sounds like you have a .repeated(), .or_not() or .separated_by(...) as your 'top-level' parser. All of these are permitted to parse no input, and will happily do so if they find only invalid occurences of the pattern.

You can force them to read all the way to the end of the input by adding a .then_ignore(end()) at the end of your parser. This forces the parser to consider the end of input as a valid part of the pattern, therefore forcing it to generate errors if the end of input is not found.

As an example, consider the following parser:

just('a').repeated()

Parsers are lazy. They only parse the input that matches their pattern. For the above pattern, giving it an input of "b" would cause it to succeed, but to return an empty vector.

We can force it to produce an error for such inputs by using

just('a').repeated().then_ignore(end())

Now when we provide it the input "b", the initial part of the parser exhibits the same behaviour, but the part we added expects an end of input immediately after, but finds only b: so it's forced to go back and consider errors from the previous parse attempt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors and Ors #30

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Errors and Ors #30

natemartinsf Nov 18, 2021

Replies: 4 comments · 1 reply

zesterer Nov 18, 2021 Maintainer

natemartinsf Nov 18, 2021 Author

zesterer Nov 18, 2021 Maintainer

natemartinsf Nov 19, 2021 Author

zesterer Nov 19, 2021 Maintainer

natemartinsf
Nov 18, 2021

Replies: 4 comments 1 reply

zesterer
Nov 18, 2021
Maintainer

natemartinsf
Nov 18, 2021
Author

zesterer
Nov 18, 2021
Maintainer

natemartinsf
Nov 19, 2021
Author

zesterer Nov 19, 2021
Maintainer