Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip whitespace in SeqN grammar #1626

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

DavidLegg
Copy link

Adds a skip expression for whitespace to the SeqN grammar, rather than dealing with whitespace by hand.
This should simplify the grammar, without significantly changing its meaning.

All grammar regression tests for legal SeqN examples passed unchanged.
A few examples of invalid SeqN parse slightly differently, but the new parse tree is still reasonable.
For these examples, I rebaselined the test to the new behavior.

@DavidLegg DavidLegg requested a review from a team as a code owner February 7, 2025 02:16
@DavidLegg
Copy link
Author

One more note to add on this PR: I think there's an opportunity to go farther, and skip newlines and maybe comments as well. I didn't do that here because I think it might expand what counts as legal SeqN, by effectively making all mandatory newlines optional, and allowing you to put comments anywhere. I don't know if that's an acceptable trade-off for having a simpler grammar. From a quick code search, it also looks like comments may be meaningful in some contexts, so skipping them may be completely off the table.

David Legg added 2 commits February 6, 2025 18:29
Add a skip expression for whitespace to the SeqN grammar, rather than dealing with whitespace by hand.
This should simplify the grammar, without significantly changing its meaning.

All grammar regression tests for legal SeqN examples pass unchanged.
A few examples of invalid SeqN parse slightly differently, but the new parse tree is still reasonable.
For these examples, rebaseline the test to the new behavior.
Condense similar rules together and in-line some very simple rules.
Copy link

sonarqubecloud bot commented Feb 7, 2025

Copy link
Contributor

@goetzrrGit goetzrrGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a good idea but we are using whitespace a a delimiter. We want the user to have space between certain elements such as time, command stems, and arguments. I believe this change should not be merged in

ex.

A2025-001T10:00:00ECHO"arg1"#test <--- Now valid with the new grammar change

Also you might have ambiguity when parsing arguments specifically with number arguments

ex

C CMD_STEM 1 10
C CMD_STEM 110 

Comment on lines -397 to +398
],
[
'Stem with disallowed characters',
`FSW_CMD%BAR$BAZ`,
`Sequence(Commands(
Command(Stem,⚠),
Command(Stem,⚠),
Command(Stem,Args)
))`,
`Sequence(Commands(Command(TimeTag(TimeComplete),⚠(Number),Stem,Args),Command(Stem,⚠,Args(Enum))))`,
],
['Stem with disallowed characters', `FSW_CMD%BAR$BAZ`, `Sequence(Commands(Command(Stem,⚠,Args(Enum,⚠,Enum))))`],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curios why this test was changed? I don't think it needed to be changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new grammar just parsed it slightly differently, so I changed the expected output to match. It winds up parsing as a single command with some errors and enum arguments, rather than multiple commands with errors in between.
The formatting was changed by the formatting and linting tools. I'm not sure if those were being applied regularly to this file... they might be doing more harm than good here.

@goetzrrGit goetzrrGit added the DON'T MERGE Do Not Merge This Branch label Feb 11, 2025
@DavidLegg
Copy link
Author

Re: the way skipping whitespace works, it's true that this would allow that "smashed together" syntax. If that's unacceptable, we could perhaps try adding "word boundary" elements into the tokens. That would demand that there are non-word characters (whitespace being the only choice that would match the rest of the grammar) between the various tokens, without actually including those non-word characters in the token we match.

As for this example:

C CMD_STEM 1 10
C CMD_STEM 110 

That's actually not ambiguous, because the "skip" happens after the tokenizer runs. That is, this gets parsed as the following tokens:

"C ", "CMD_STEM", whitespace, "1", whitespace, "10", whitespace,
"C ", "CMD_STEM", whitespace, "110", whitespace

then the whitespace is dropped, and finally the grammar is applied, resulting in this parse tree (ignore the line numbers, I just tacked this onto another example I was working on)

    ┣━  Command [9:0..10:0]
    ┃   ┣━  TimeTag [9:0..9:2]
    ┃   ┃   ┗━  TimeComplete [9:0..9:2]: "C "
    ┃   ┣━  Stem [9:2..9:10]: "CMD_STEM"
    ┃   ┗━  Args [9:11..9:15]
    ┃       ┣━  Number [9:11..9:12]: "1"
    ┃       ┗━  Number [9:13..9:15]: "10"
    ┗━  Command [10:0..11:0]
        ┣━  TimeTag [10:0..10:2]
        ┃   ┗━  TimeComplete [10:0..10:2]: "C "
        ┣━  Stem [10:2..10:10]: "CMD_STEM"
        ┗━  Args [10:11..10:14]
            ┗━  Number [10:11..10:14]: "110"

All of that said, I think we're going to go a different route for sequence templates, that doesn't need to modify the SeqN grammar. So, feel free to take this if you think it'll help with maintenance, or close it. It won't interfere with the sequence templates either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do Not Merge This Branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants