[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

rsmmr · 2024-12-17T08:18:29Z

This is still experimental and WIP. The idea is letting error recovery
directly jump forward to specific offsets in the input for cases where
fixed-size unit/fields allow us to compute where to continue.

Fix ParserBuilder::advanceInput().
Remove ParserBuilder::setInput().
Enable productions to provide the number of bytes they parse, if known.
Support synchronization points at fixed offsets, without pattern search.

The implementation tried to automatically detect if it was passed a view or something else. However, that check was unreliable because the type of the argument could still be unresolved (i.e., `auto`). This change removes that broken support for passing a view; seems that wasn't actually used anywhere anyways (probably precisely because it didn't work).

There was exactly one place using it, but there are many other places not using it and instead just setting `state().cur` directly--which is all that this method did as well.

For productions that parse a static amount of data, that size can now be retrieved. This includes cases where the number is determined through some expression evaluated at runtime, as long as parsing the production is guaranteed to consume always exactly as many bytes as the expression yields.

If a synchronization point (i.e., a field with `&synchronize`) can be determined to reside at a fixed offset from either the start of a unit or a previous synchronization point, we can now leverage that after an error to jump there directly to resume parsing.

This aims to address a situation like this: type Chunks = unit { chunks: (Chunk &synchronize)[]; }; type Chunk = unit { content_size: bytes &until=b"\n" &convert=$$.to_uint(); content: bytes &size=self.content_size; }; If an error happens while parsing `content` (e.g., a gap in the input), the top-level `chunks` should be able to just move on the subsequent chunk because it's clear where that start (namely after `contents_size` bytes). This proof-of-concept implementation makes this example work, but isn't very pretty nor tested in detail; and other tests fail currently. Need to clean up and re-assess.

rsmmr added 3 commits December 17, 2024 09:14

Remove ParserBuilder::setInput().

736d587

There was exactly one place using it, but there are many other places not using it and instead just setting `state().cur` directly--which is all that this method did as well.

rsmmr changed the title ~~Extend synchronization points for fixed-size units/fields~~ [WIP] Extend synchronization points to leverage fixed-size units/fields Dec 17, 2024

rsmmr added 2 commits December 20, 2024 13:10

rsmmr force-pushed the topic/robin/sync-improvements branch from 680a106 to a13eb04 Compare December 20, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

rsmmr commented Dec 17, 2024

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Are you sure you want to change the base?

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Conversation

rsmmr commented Dec 17, 2024