Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(parser): faster line comment #2327

Closed
wants to merge 2 commits into from
Closed

perf(parser): faster line comment #2327

wants to merge 2 commits into from

Conversation

sno2
Copy link
Contributor

@sno2 sno2 commented Feb 5, 2024

Utilizes memchr to find the end of the line comment which might yield performance improvements. Also, adds a new Source::eat_until_byte3 along with Source::set_eof that allows us to naturally integrate memchr-based searches
into the lexer.

Thankfully, memchr does not have any dependencies so this change does not add that much bloat to the binary.

Utilizes memchr to find the end of the line comment which _might_ yield
performance improvements. Also, adds a new `Source::eat_until_byte3`
that allows us to naturally integrate memchr-based searches into the
lexer.

Thankfully, memchr does not have any dependencies so this change does not
add that much bloat to the binary.
@github-actions github-actions bot added the A-parser Area - Parser label Feb 5, 2024
@Boshen Boshen requested a review from overlookmotel February 5, 2024 17:32
Copy link

codspeed-hq bot commented Feb 5, 2024

CodSpeed Performance Report

Merging #2327 will improve performances by 6.03%

⚠️ No base runs were found

Falling back to comparing sno2:perf/line-comment-fast (c13e30e) with main (6fe9300)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main sno2:perf/line-comment-fast Change
lexer[checker.ts] 144.2 ms 136 ms +6.03%

@overlookmotel overlookmotel mentioned this pull request Feb 5, 2024
@overlookmotel
Copy link
Contributor

@sno2 Thanks for this. Not had time to review yet, but can I ask one question: Is this intended as an effort towards #2296?

@sno2
Copy link
Contributor Author

sno2 commented Feb 6, 2024

@sno2 Thanks for this. Not had time to review yet, but can I ask one question: Is this intended as an effort towards #2296?

This was more of an experiment to see how well memchr would fare for this use-case. It is not that useful, though, because most of the other SIMD-oriented parts require more than 3 bytes for searching. To be frank, this may not be worth merging at all given how little it affected performance.

@Boshen
Copy link
Member

Boshen commented Feb 6, 2024

To be frank, this may not be worth merging at all given how little it affected performance

This sped up checker.ts which has lots of comments.

image

@overlookmotel
Copy link
Contributor

Thanks very much for coming back. Yes, the performance gain is a little underwhelming. I would have hoped for more, especially given that on CI, the SIMD lane size is probably 32 (I assume Github's runners support AVX2, but I don't think memchr supports AVX-512 to do 64-byte batches).

Nonetheless, it would be worth doing, as it does provide some speedup in heavily-commented code. And many incremental gains add up.

But... I'm afraid I think you're missing one edge case. It also needs to handle 0x0B and 0x0C (irregular whitespace) as these need "trivia"s created for them if they're found.

That means there are 5 x possible bytes to search for, which rules out memchr, as it only supports up to 3.

In my opinion (and please note I am no expert on this stuff, so could be wrong), we need to use a different approach from memchr's, and use "pshufd"-based table lookups to support searching for any number of possible bytes. If you're interested, I linked to an article about this on #2296.

So ultimately, I'm afraid I don't think this specific PR is viable. But that doesn't mean it was wasted time. It is always valuable to test and rule out different possibilities. The result is surprising (at least to me), so thank you for exploring this avenue.

If you're willing and keen to push further down this path towards SIMD, but don't want to dig into writing SIMD assembler code line by line (what sane person does?), there is one massive thing you could help with: I find it hard to believe there is not already a crate available which does "phufd"-based byte searching, as it seems like a common need. But I can't find one! if you were able to track one down, that would basically solve how to do SIMD in OXC entirely.

@Boshen Boshen marked this pull request as draft February 6, 2024 14:02
@sno2
Copy link
Contributor Author

sno2 commented Feb 6, 2024

But... I'm afraid I think you're missing one edge case. It also needs to handle 0x0B and 0x0C (irregular whitespace) as these need "trivia"s created for them if they're found.

I don't believe that the 0x0B and 0x0C were handled properly in the first place:

/// U+000A LINE FEED, abbreviated in the spec as `<LF>`.
pub const LF: char = '\u{a}';
/// U+000D CARRIAGE RETURN, abbreviated in the spec as `<CR>`.
pub const CR: char = '\u{d}';
/// U+2028 LINE SEPARATOR, abbreviated `<LS>`.
pub const LS: char = '\u{2028}';
/// U+2029 PARAGRAPH SEPARATOR, abbreviated `<PS>`.
pub const PS: char = '\u{2029}';
pub fn is_regular_line_terminator(c: char) -> bool {
matches!(c, LF | CR)
}
pub fn is_irregular_line_terminator(c: char) -> bool {
matches!(c, LS | PS)
}
pub fn is_line_terminator(c: char) -> bool {
is_regular_line_terminator(c) || is_irregular_line_terminator(c)
}

while let Some(c) = self.next_char() {
if is_line_terminator(c) {
self.token.is_on_new_line = true;
self.trivia_builder
.add_single_line_comment(start, self.offset() - c.len_utf8() as u32);
return Kind::Skip;
}
}

In my opinion (and please note I am no expert on this stuff, so could be wrong), we need to use a different approach from memchr's, and use "pshufd"-based table lookups to support searching for any number of possible bytes. If you're interested, I linked to an article about this on #2296.

I am looking into how we could try to implement this.

So ultimately, I'm afraid I don't think this specific PR is viable. But that doesn't mean it was wasted time. It is always valuable to test and rule out different possibilities. The result is surprising (at least to me), so thank you for exploring this avenue.

Yep, I agree.

If you're willing and keen to push further down this path towards SIMD, but don't want to dig into writing SIMD assembler code line by line (what sane person does?), there is one massive thing you could help with: I find it hard to believe there is not already a crate available which does "phufd"-based byte searching, as it seems like a common need. But I can't find one! if you were able to track one down, that would basically solve how to do SIMD in OXC entirely.

I am not sure if it will be this simple. In many cases, we need to defer to a scalar version whenever we see non-ASCII characters which would not be possible in byte-based searches.

@sno2 sno2 closed this Feb 6, 2024
@overlookmotel
Copy link
Contributor

I am not sure if it will be this simple. In many cases, we need to defer to a scalar version whenever we see non-ASCII characters which would not be possible in byte-based searches.

I'm currently working on a a re-write of lexing identifiers. It doesn't use SIMD, but is designed to be easily convertable to SIMD later: https://github.com/overlookmotel/oxc/blob/lexer-string5/crates/oxc_parser/src/lexer/identifier.rs
(please excuse the messy code, still very much a work in progress)

You're right about having to bust out to scalar if unicode chars are found, but approach I've taken is to speed through ASCII as fast as possible and then Unicode is a de-opt (and on a #[cold] path so compiler assumes it won't happen in most cases). This approach seems to work - produces about 60% speed-up on the Lexer benchmarks. Obviously using SIMD would be a further speed boost.

But, yeah, I think this is one approach to the problem you've identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants