Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(parser): reduce work parsing regexps #1999

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Jan 11, 2024

#1926 produced a small performance regression because when parsing a regexp, some work is repeated.

Hopefully this PR may reverse it.

Perhaps implementation is a little messy, but posting it now as want to see what Codspeed says.

@github-actions github-actions bot added the A-parser Area - Parser label Jan 11, 2024
Copy link

codspeed-hq bot commented Jan 11, 2024

CodSpeed Performance Report

Merging #1999 will not alter performance

Comparing overlookmotel:parser-regex (efee37e) with main (0a08686)

Summary

✅ 14 untouched benchmarks

@overlookmotel
Copy link
Contributor Author

Damn it! This makes no difference whatsoever!

Any idea why performance dropped slightly on #1926? Or was it just noise in the benchmarks?

@Boshen
Copy link
Member

Boshen commented Jan 12, 2024

Damn it! This makes no difference whatsoever!

Any idea why performance dropped slightly on #1926? Or was it just noise in the benchmarks?

Just noise.


I forgot that regex has it's own standalone code path for getting the token, this PR is still good because it removes the weird re-parsing code from parse_regex.

There is no perf different because this is such a minor code path, and probably there's only a few regexes in the benchmark.

@Boshen Boshen merged commit c731685 into oxc-project:main Jan 12, 2024
17 checks passed
@overlookmotel
Copy link
Contributor Author

Thanks. One question: You said on #1926:

In order to make things work nicely, the lexer will no longer recover
from a invalid regex.

Was that a desired outcome, or a compromise you had to make to get it working?

If the latter, I think the old behavior could now be restored after this PR by changing the breaks back into continues in the 2nd loop in read_regex:

while let Some(ch @ ('$' | '_' | 'a'..='z' | 'A'..='Z' | '0'..='9')) = self.peek() {
self.current.chars.next();
if !ch.is_ascii_lowercase() {
self.error(diagnostics::RegExpFlag(ch, self.current_offset()));
break;
}
let flag = if let Ok(flag) = RegExpFlags::try_from(ch) {
flag
} else {
self.error(diagnostics::RegExpFlag(ch, self.current_offset()));
break;
};
if flags.contains(flag) {
self.error(diagnostics::RegExpFlagTwice(ch, self.current_offset()));
break;
}
flags |= flag;
}

@overlookmotel overlookmotel deleted the parser-regex branch January 12, 2024 11:24
@Boshen
Copy link
Member

Boshen commented Jan 12, 2024

Was that a desired outcome, or a compromise you had to make to get it working?

A compromise.

But leaving it as a hard error is also fine, I never had any rules for recoverability 😅

Boshen pushed a commit that referenced this pull request Jan 12, 2024
As discussed in
#1999 (comment),
this PR restores some of regex parsing behavior to as it was prior to
#1926.
IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this pull request May 29, 2024
oxc-project#1926 produced a small performance regression because when parsing a
regexp, some work is repeated.
IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this pull request May 29, 2024
As discussed in
oxc-project#1999 (comment),
this PR restores some of regex parsing behavior to as it was prior to
oxc-project#1926.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants