perf(parser): faster line comment #2327

sno2 · 2024-02-05T17:17:17Z

Utilizes memchr to find the end of the line comment which might yield performance improvements. Also, adds a new Source::eat_until_byte3 along with Source::set_eof that allows us to naturally integrate memchr-based searches
into the lexer.

Thankfully, memchr does not have any dependencies so this change does not add that much bloat to the binary.

Utilizes memchr to find the end of the line comment which _might_ yield performance improvements. Also, adds a new `Source::eat_until_byte3` that allows us to naturally integrate memchr-based searches into the lexer. Thankfully, memchr does not have any dependencies so this change does not add that much bloat to the binary.

codspeed-hq · 2024-02-05T17:36:10Z

CodSpeed Performance Report

Merging #2327 will improve performances by 6.03%

_{⚠️ No base runs were found}

_{Falling back to comparing sno2:perf/line-comment-fast (c13e30e) with main (6fe9300)}

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`sno2:perf/line-comment-fast`	Change
⚡	`lexer[checker.ts]`	144.2 ms	136 ms	+6.03%

overlookmotel · 2024-02-05T19:12:18Z

@sno2 Thanks for this. Not had time to review yet, but can I ask one question: Is this intended as an effort towards #2296?

sno2 · 2024-02-06T00:38:25Z

@sno2 Thanks for this. Not had time to review yet, but can I ask one question: Is this intended as an effort towards #2296?

This was more of an experiment to see how well memchr would fare for this use-case. It is not that useful, though, because most of the other SIMD-oriented parts require more than 3 bytes for searching. To be frank, this may not be worth merging at all given how little it affected performance.

Boshen · 2024-02-06T03:37:46Z

To be frank, this may not be worth merging at all given how little it affected performance

This sped up checker.ts which has lots of comments.

overlookmotel · 2024-02-06T12:09:46Z

Thanks very much for coming back. Yes, the performance gain is a little underwhelming. I would have hoped for more, especially given that on CI, the SIMD lane size is probably 32 (I assume Github's runners support AVX2, but I don't think memchr supports AVX-512 to do 64-byte batches).

Nonetheless, it would be worth doing, as it does provide some speedup in heavily-commented code. And many incremental gains add up.

But... I'm afraid I think you're missing one edge case. It also needs to handle 0x0B and 0x0C (irregular whitespace) as these need "trivia"s created for them if they're found.

That means there are 5 x possible bytes to search for, which rules out memchr, as it only supports up to 3.

In my opinion (and please note I am no expert on this stuff, so could be wrong), we need to use a different approach from memchr's, and use "pshufd"-based table lookups to support searching for any number of possible bytes. If you're interested, I linked to an article about this on #2296.

So ultimately, I'm afraid I don't think this specific PR is viable. But that doesn't mean it was wasted time. It is always valuable to test and rule out different possibilities. The result is surprising (at least to me), so thank you for exploring this avenue.

If you're willing and keen to push further down this path towards SIMD, but don't want to dig into writing SIMD assembler code line by line (what sane person does?), there is one massive thing you could help with: I find it hard to believe there is not already a crate available which does "phufd"-based byte searching, as it seems like a common need. But I can't find one! if you were able to track one down, that would basically solve how to do SIMD in OXC entirely.

sno2 · 2024-02-06T22:36:49Z

But... I'm afraid I think you're missing one edge case. It also needs to handle 0x0B and 0x0C (irregular whitespace) as these need "trivia"s created for them if they're found.

I don't believe that the 0x0B and 0x0C were handled properly in the first place:

oxc/crates/oxc_syntax/src/identifier.rs

Lines 42 to 64 in 0f28dc8

    
           ///  U+000A LINE FEED, abbreviated in the spec as `<LF>`. 
        
           pub const LF: char = '\u{a}'; 
        
           /// U+000D CARRIAGE RETURN, abbreviated in the spec as `<CR>`. 
        
           pub const CR: char = '\u{d}'; 
        
           /// U+2028 LINE SEPARATOR, abbreviated `<LS>`. 
        
           pub const LS: char = '\u{2028}'; 
        
           /// U+2029 PARAGRAPH SEPARATOR, abbreviated `<PS>`. 
        
           pub const PS: char = '\u{2029}'; 
        
           pub fn is_regular_line_terminator(c: char) -> bool { 
        
               matches!(c, LF | CR) 
        
           } 
        
           pub fn is_irregular_line_terminator(c: char) -> bool { 
        
               matches!(c, LS | PS) 
        
           } 
        
           pub fn is_line_terminator(c: char) -> bool { 
        
               is_regular_line_terminator(c) || is_irregular_line_terminator(c) 
        
           }

oxc/crates/oxc_parser/src/lexer/comment.rs

Lines 11 to 18 in 0f28dc8

    
           while let Some(c) = self.next_char() { 
        
               if is_line_terminator(c) { 
        
                   self.token.is_on_new_line = true; 
        
                   self.trivia_builder 
        
                       .add_single_line_comment(start, self.offset() - c.len_utf8() as u32); 
        
                   return Kind::Skip; 
        
               } 
        
           }

In my opinion (and please note I am no expert on this stuff, so could be wrong), we need to use a different approach from memchr's, and use "pshufd"-based table lookups to support searching for any number of possible bytes. If you're interested, I linked to an article about this on #2296.

I am looking into how we could try to implement this.

So ultimately, I'm afraid I don't think this specific PR is viable. But that doesn't mean it was wasted time. It is always valuable to test and rule out different possibilities. The result is surprising (at least to me), so thank you for exploring this avenue.

Yep, I agree.

If you're willing and keen to push further down this path towards SIMD, but don't want to dig into writing SIMD assembler code line by line (what sane person does?), there is one massive thing you could help with: I find it hard to believe there is not already a crate available which does "phufd"-based byte searching, as it seems like a common need. But I can't find one! if you were able to track one down, that would basically solve how to do SIMD in OXC entirely.

I am not sure if it will be this simple. In many cases, we need to defer to a scalar version whenever we see non-ASCII characters which would not be possible in byte-based searches.

overlookmotel · 2024-02-07T03:07:32Z

I am not sure if it will be this simple. In many cases, we need to defer to a scalar version whenever we see non-ASCII characters which would not be possible in byte-based searches.

I'm currently working on a a re-write of lexing identifiers. It doesn't use SIMD, but is designed to be easily convertable to SIMD later: https://github.com/overlookmotel/oxc/blob/lexer-string5/crates/oxc_parser/src/lexer/identifier.rs
(please excuse the messy code, still very much a work in progress)

You're right about having to bust out to scalar if unicode chars are found, but approach I've taken is to speed through ASCII as fast as possible and then Unicode is a de-opt (and on a #[cold] path so compiler assumes it won't happen in most cases). This approach seems to work - produces about 60% speed-up on the Lexer benchmarks. Obviously using SIMD would be a further speed boost.

But, yeah, I think this is one approach to the problem you've identified.

github-actions bot added the A-parser Area - Parser label Feb 5, 2024

Merge branch 'main' into perf/line-comment-fast

c13e30e

Boshen requested a review from overlookmotel February 5, 2024 17:32

overlookmotel mentioned this pull request Feb 5, 2024

SIMD in the lexer #2296

Closed

Boshen marked this pull request as draft February 6, 2024 14:02

sno2 closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(parser): faster line comment #2327

perf(parser): faster line comment #2327

sno2 commented Feb 5, 2024 •

edited

Loading

codspeed-hq bot commented Feb 5, 2024

overlookmotel commented Feb 5, 2024

sno2 commented Feb 6, 2024 •

edited

Loading

Boshen commented Feb 6, 2024

overlookmotel commented Feb 6, 2024

sno2 commented Feb 6, 2024

overlookmotel commented Feb 7, 2024

perf(parser): faster line comment #2327

perf(parser): faster line comment #2327

Conversation

sno2 commented Feb 5, 2024 • edited Loading

codspeed-hq bot commented Feb 5, 2024

CodSpeed Performance Report

Merging #2327 will improve performances by 6.03%

Summary

Benchmarks breakdown

overlookmotel commented Feb 5, 2024

sno2 commented Feb 6, 2024 • edited Loading

Boshen commented Feb 6, 2024

overlookmotel commented Feb 6, 2024

sno2 commented Feb 6, 2024

overlookmotel commented Feb 7, 2024

sno2 commented Feb 5, 2024 •

edited

Loading

sno2 commented Feb 6, 2024 •

edited

Loading