Feat/mini parse 2 alpha #62

stefnotch · 2025-02-03T12:28:43Z

tl;dr: I finally got to try out all my mini-parse ideas. 🎉 I am now wondering which of these are worth keeping, and which ones are not.

I tried writing a mini-parse library which

keeps track of the input stream type
- This gives us a typed variation of token(kind: Kind, value: string)
- In winnow-land, this design also lets them generically operate on strings or binary streams. This isn't useful for us.
- I am using this for the span combinator, but I'm not convinced that that design is good.
- tl;dr: It's nifty, but I don't care too much about this.
keeps track of whether a parser can backtrack
- This is used to almost always guarantee "no backtracking". e.g. seq2(tryToken("keyword", "import"), token("symbol", ";")) has the semantics "only the first thing of the sequence parser can backtrack", so only that part will get a if (result == null) return null; check.
- or parsers assert that their children must be capable of backtracking. Otherwise they'd be useless children.
- This made me realize that the imports grammar, as written, actually requires a 2 tokens lookahead.
- However, it makes writing parsers a lot more verbose. See ImportGrammar.ts
- It should, in theory, make debugging parsing failures easier.
- tl;dr: I think it's super neat that this is possible. I am, however, not convinced that it's what we want for our implementation.
Has a public _run method called parseNext. It's intended that users can write parsers in an alternative style, see Parser2.test.ts. This lets us hand-write parsers and parsing logic for hot paths of the code.
has way less overhead
- yeah that is useful. I wonder why its overhead is so much lower.

To try it out, I then wrote a parser combinator which calls the new implementation. And then I rewrote the imports grammar to use the new implementation.
Benchmarks are on Discord, but the rough results are that the perf could go from wgsl-linker LOC/sec: 33.229 to wgsl-linker LOC/sec: 123.075.

The unit tests are failing, and that's fine.

stefnotch · 2025-02-03T12:56:30Z

For tracking spans, I can think of a few different options

Add previousToken to checkpoints => track first token with a custom lexer, last token is what checkpoint says
Add the rule that parseNext always leaves you at the end of a token. => track first token with a custom lexer, end is checkpoint
Add the rule that you are always at the beginning of a token, and add previousToken to checkpoints => [checkpoint().token.span[1], checkpoint()]. Could also allow for another optimisation
Or not having a span() combinator, and manually building it up from the info that is already present in the tokens.

What does not work

Only using checkpoints. Because whitespace
Storing a "previous token", because peek + reset would invalidate that
Split parseNext into "skipIgnored and parseNext". parseNext would always leave you at the end of a token, skipIgnored would always bring you to the start of a token. => checkpoints are spans. However I wouldn't know if a child parser already called skipIgnored, so the checkpoint might not be reliable.
Just adding "prevTokenEnd" and "nextTokenStart" to the API, because .reset() would invalidate the next token and force us to inefficiently recompute it. There are more efficient variations above.

stefnotch · 2025-02-04T23:30:07Z

I'm picking the option where parseNext always leaves you at the end of a token, and the span combinator does a peek() (peeking is done as const before = lexer.checkpoint(); const s = lexer.peek().span[0]; lexer.reset(before);).
Then I'll make sure that peeking is optimized (it'll get used a lot), and more importantly: It's a zero cost abstraction!

If it still ends up being slow, I can try that option:

Add the rule that you are always at the beginning of a token, and add previousToken to checkpoints => [checkpoint().token.span[1], checkpoint()]. Could also allow for another optimisation

stefnotch added 7 commits February 1, 2025 21:27

Remove mini-parse tags

76bfa6c

Use delimited and co in more places

bd69756

Improve perf of mapValue

39738ce

Start writing Parser2

4b192cb

Implement most combinators

b52568f

Try out new parser lib

eb2fd7d

Don't use caching stream

70f6e83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/mini parse 2 alpha #62

Feat/mini parse 2 alpha #62

stefnotch commented Feb 3, 2025 •

edited

Loading

stefnotch commented Feb 3, 2025 •

edited

Loading

stefnotch commented Feb 4, 2025 •

edited

Loading

Feat/mini parse 2 alpha #62

Are you sure you want to change the base?

Feat/mini parse 2 alpha #62

Conversation

stefnotch commented Feb 3, 2025 • edited Loading

stefnotch commented Feb 3, 2025 • edited Loading

stefnotch commented Feb 4, 2025 • edited Loading

stefnotch commented Feb 3, 2025 •

edited

Loading

stefnotch commented Feb 3, 2025 •

edited

Loading

stefnotch commented Feb 4, 2025 •

edited

Loading