-
-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lexer): add SIMD optimization to the lexer #2338
Conversation
+r @Boshen @overlookmotel |
Hi @dyxushuai. I'm afraid I'm having difficulty reviewing this, as it's quite complex and you're covering all the bases in one go. Can I suggest a few things which would make it simpler to understand:
Ideally, CI would also pass. Though if that's difficult, feel free to ask for review again without that. |
I am on vacation and have been offline for a while.
Yes, I will add more test cases in my implements
Sure, Always try to minimize overhead as much as possible.
Cool
Ok.
I will explain the instructions as much as possible. More details here: seanmonstar/httparse#89 (comment) |
…eat/simd_in_lexer
All tests passed, next I will add benchmark comparison. |
Currently, we only support SIMD matches for ASCII codes (<128) because NEON supports only 6x16 vectors. If we aim to handle all u8::MAX bytes at once, we would require a minimum of 16x16 vector support. BTW, we can split the lookups into two parts oxc/crates/oxc_parser/src/lexer/comment.rs Lines 18 to 22 in 065584c
|
@overlookmotel @Brooooooklyn @Boshen Bench parserFor now, we only support parse Platforms
Commands
With AVX2parser/checker.ts time: [13.561 ms 13.583 ms 13.607 ms]
change: [-7.8967% -7.7080% -7.5257%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high mild
parser/cal.com.tsx time: [6.2570 ms 6.2633 ms 6.2704 ms]
change: [-4.9506% -4.8115% -4.6668%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
parser/RadixUIAdoptionSection.jsx
time: [13.837 µs 13.862 µs 13.890 µs]
change: [+1.7289% +1.9880% +2.2377%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
parser/pdf.mjs time: [4.4353 ms 4.4394 ms 4.4440 ms]
change: [-5.4993% -5.3649% -5.2352%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
parser/antd.js time: [27.754 ms 27.782 ms 27.812 ms]
change: [-4.6780% -4.5492% -4.4233%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe With SSE4.2parser/checker.ts time: [14.199 ms 14.216 ms 14.234 ms]
change: [-5.4070% -5.2090% -5.0262%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
parser/cal.com.tsx time: [6.3727 ms 6.3808 ms 6.3898 ms]
change: [-3.4983% -3.3151% -3.1209%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
parser/RadixUIAdoptionSection.jsx
time: [13.954 µs 13.994 µs 14.037 µs]
change: [+1.6748% +1.9549% +2.2534%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
parser/pdf.mjs time: [4.5373 ms 4.5431 ms 4.5494 ms]
change: [-3.8748% -3.6909% -3.4907%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
8 (8.00%) high mild
1 (1.00%) high severe
parser/antd.js time: [28.388 ms 28.422 ms 28.458 ms]
change: [-4.3719% -4.1995% -4.0308%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild With NEONTODO |
@dyxushuai This is looking really great. Like I said, I won't have time to get into it properly this week. But at a first glance it's shaping up really nicely. Can't wait to see the benchmarks. A couple of points for OXC maintainers: Is there any reason not to run CI and benchmarks on this now? I don't want to press the button myself, since I'm not sure why it's not automatic anyway. Regardless of the results and quality of the implementation, in my opinion I don't think this is safe to merge as yet. This introduces different implementations for different processors, and to make sure nothing is breaking/regressing (either now or in the future), it'd be ideal to have tests, conformance, and benchmarks run on all these targets (as discussed in #2285). Either we'd need to get all that in place before merging, or possibly the parts which target the architectures which are already tested on CI could be merged sooner, and the rest later (as long as the scalar fallbacks which run on other archs for now are tested+benched too). This isn't a negative on your fine work dyxushuai, just this is a major change, and I think we need to tread carefully. |
https://doc.rust-lang.org/beta/core/arch/wasm/fn.i8x16_shuffle.html I see that WASM has enough SIMD instructions to implement the same lookup table algorithm as other platforms. |
@dyxushuai Try https://github.com/BurntSushi/critcmp This whole PR is shaping up really nicely, I think the hardest part for us maintainers is how we continuously integrate SIMD into the codebase, let my take a few days to think about this. I'd like to invite some SIMD experts to review this PR, is it possible to document the entry functions so anyone can read the code from start to bottom without knowing any context? |
@dyxushuai If this is fun for you, and if you are willing to maintain a separate crate for all the simd stuff inside the oxc project, I'll give you all the correct permissions so we can setup proper test infrastructure in another github repository. |
CodSpeed Performance ReportMerging #2338 will degrade performances by 35.07%Comparing Summary
Benchmarks breakdown
|
cool i want to try |
I have identified the root cause, which is that padding was used for buffers that have not enough length with ALIGNMENT, and then more padding was repeatedly added to the end as needed if we haven't finished using it. I will fall back to using the |
@dyxushuai Sorry for long silence. I'll be able to review this over the weekend. Could you possibly squash the commits and rebase on main? It's quite hard to review at present due to the number of commits. |
Of course. Sorry for the delay, I took a long vacation during Chinese New Year. Currently, I am analyzing the performance issue and this is not the final version of my PR. |
No worries at all. If you'd prefer to do more work before rebasing, by all means do that. Just I have some time spare this weekend, and feels like it'd be a good chance to take a look at this properly, even if it's not final. I might have some ideas about the performance issue too, if that'd be helpful. |
@dyxushuai It looks like your attention has moved elsewhere (totally understandable, such is the way of open source, we all have other priorities to deal with). Just wanted to check in if you were interested in reviving this effort? Otherwise, I think we'll close this, and can revisit SIMD another time. |
@overlookmotel are you comfortable with us closing this PR? |
Yes, unfortunately it looks like it's unlikely to move forwards at present. We can return to this later and build on this work. NB: There are some good SIMD table lookup techniques in this PR. It is worth looking at when we make another stab at introducing SIMD (either in lexer, or elsewhere in Oxc, like source map generation). |
Related: #2296
Summary
This implementation can support matches multiple delimiters (0..256) at once on multiple bytes. For example:
On the CPU which supports AVX2 instruction, and we have 10 delimiters need to be matched, the we can matches
32bytes * 10delimiters
at once!Supported features:
Supported pattern
Tricky issues:
Useu8
instead ofchar
with SIMD.Refactor theSource
API.Refactor the method of the character-searching.