Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Extend synchronization points to leverage fixed-size units/fields #1948

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rsmmr
Copy link
Member

@rsmmr rsmmr commented Dec 17, 2024

This is still experimental and WIP. The idea is letting error recovery
directly jump forward to specific offsets in the input for cases where
fixed-size unit/fields allow us to compute where to continue.

  • Fix ParserBuilder::advanceInput().
  • Remove ParserBuilder::setInput().
  • Enable productions to provide the number of bytes they parse, if known.
  • Support synchronization points at fixed offsets, without pattern search.

The implementation tried to automatically detect if it was passed a
view or something else. However, that check was unreliable because the
type of the argument could still be unresolved (i.e., `auto`). This
change removes that broken support for passing a view; seems that
wasn't actually used anywhere anyways (probably precisely because it
didn't work).
There was exactly one place using it, but there are many other places
not using it and instead just setting `state().cur` directly--which is
all that this method did as well.
For productions that parse a static amount of data, that size can now
be retrieved. This includes cases where the number is determined
through some expression evaluated at runtime, as long as parsing the
production is guaranteed to consume always exactly as many bytes as
the expression yields.
@rsmmr rsmmr changed the title Extend synchronization points for fixed-size units/fields [WIP] Extend synchronization points to leverage fixed-size units/fields Dec 17, 2024
If a synchronization point (i.e., a field with `&synchronize`) can be
determined to reside at a fixed offset from either the start of a unit
or a previous synchronization point, we can now leverage that after an
error to jump there directly to resume parsing.
This aims to address a situation like this:

    type Chunks = unit {
        chunks: (Chunk &synchronize)[];
    };

    type Chunk = unit {
        content_size: bytes &until=b"\n" &convert=$$.to_uint();
        content: bytes &size=self.content_size;
    };

If an error happens while parsing `content` (e.g., a gap in the
input), the top-level `chunks` should be able to just move on the
subsequent chunk because it's clear where that start (namely after
`contents_size` bytes).

This proof-of-concept implementation makes this example work, but
isn't very pretty nor tested in detail; and other tests fail
currently. Need to clean up and re-assess.
@rsmmr rsmmr force-pushed the topic/robin/sync-improvements branch from 680a106 to a13eb04 Compare December 20, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant