Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add runtime detection #55

Closed
wants to merge 6 commits into from

Conversation

aumetra
Copy link
Contributor

@aumetra aumetra commented Jan 5, 2024

What type of PR is this?

feat: A new feature

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.

- [ ] Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

This PR implements runtime detection of the SSE2, NEON, and AVX2 features, queried via the is_x86_feature_enabled and is_aarch64_feature_enabled macros (gated behind the runtime-detection macro).

(Optional) Which issue(s) this PR fixes:

Closes #14

(optional) The PR that updates user documentation:

@CLAassistant
Copy link

CLAassistant commented Jan 5, 2024

CLA assistant check
All committers have signed the CLA.

@aumetra
Copy link
Contributor Author

aumetra commented Jan 5, 2024

I only updated the english README, sorry, I don't speak Chinese.

@aumetra
Copy link
Contributor Author

aumetra commented Jan 5, 2024

Also, I don't have access to an aarch64 CPU with NEON support, so I can't test the code.
But the CI has access to such runners, right?

@liuq19
Copy link
Collaborator

liuq19 commented Jan 5, 2024

Thanks for your PR. I tested in aarch64, and the tests failed.

The SIMD operators are executed by dispatch, which may hurt the performance.

We should dispatch at the function call( such as parse_string) and even the whole Parser.

@aumetra
Copy link
Contributor Author

aumetra commented Jan 5, 2024

The SIMD operators are executed by dispatch, which may hurt the performance.
We should dispatch at the function call( such as parse_string) and even the whole Parser.

Yeah, makes sense. I just chose this design for now since it was pretty intuitive to implement. Some enum dispatch here, a few checks there, and it pretty much worked (well, minus the aarch64 architecture. It's a bit hard to test for me, so it will probably end up with CI golfing).

And about the performance difference, I'll run the benches and check the difference between the runtime-dispatch feature enabled and disabled.
Then we can maybe judge whether the performance penalties are acceptable.
Because for fully high-performance use-cases where the chipset is already known in advance, target_cpu can be set and the runtime-dispatch feature can be disabled.

@aumetra
Copy link
Contributor Author

aumetra commented Jan 5, 2024

Well, this fail doesnt look.. pretty: https://github.com/cloudwego/sonic-rs/actions/runs/7423555607/job/20201409871?pr=55#step:5:15

But technically it's not an issue of the code in itself. It's the compiler issuing an illegal instruction, which is really peculiar

@liuq19
Copy link
Collaborator

liuq19 commented Jan 9, 2024

The SIMD operators are executed by dispatch, which may hurt the performance.
We should dispatch at the function call( such as parse_string) and even the whole Parser.

Yeah, makes sense. I just chose this design for now since it was pretty intuitive to implement. Some enum dispatch here, a few checks there, and it pretty much worked (well, minus the aarch64 architecture. It's a bit hard to test for me, so it will probably end up with CI golfing).

And about the performance difference, I'll run the benches and check the difference between the runtime-dispatch feature enabled and disabled. Then we can maybe judge whether the performance penalties are acceptable. Because for fully high-performance use-cases where the chipset is already known in advance, target_cpu can be set and the runtime-dispatch feature can be disabled.

I benched and the runtime detection performance loss (decrease 20%~80%) is unacceptable.

We should dispatch in large functions to make sure the performance loss is minimal.

twitter/sonic_rs::from_slice_unchecked
                        time:   [1.0678 ms 1.0858 ms 1.1074 ms]
                        change: [+63.401% +66.666% +70.829%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
twitter/sonic_rs::from_slice
                        time:   [1.1441 ms 1.1788 ms 1.2171 ms]
                        change: [+64.267% +69.801% +76.125%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe

citm_catalog/sonic_rs::from_slice_unchecked
                        time:   [2.2308 ms 2.2454 ms 2.2616 ms]
                        change: [+79.630% +81.598% +83.422%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
citm_catalog/sonic_rs::from_slice
                        time:   [2.3011 ms 2.3200 ms 2.3423 ms]
                        change: [+78.637% +80.635% +82.586%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

canada/sonic_rs::from_slice_unchecked
                        time:   [5.0099 ms 5.1059 ms 5.2245 ms]
                        change: [+26.169% +28.795% +31.648%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
canada/sonic_rs::from_slice
                        time:   [5.2193 ms 5.2627 ms 5.3117 ms]
                        change: [+19.979% +23.366% +26.276%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

@liuq19
Copy link
Collaborator

liuq19 commented Jan 9, 2024

Well, this fail doesnt look.. pretty: https://github.com/cloudwego/sonic-rs/actions/runs/7423555607/job/20201409871?pr=55#step:5:15

But technically it's not an issue of the code in itself. It's the compiler issuing an illegal instruction, which is really peculiar

An occasional panic when compiling in macos@stable, it seems just a compile error. We could ignore it ~

@aumetra
Copy link
Contributor Author

aumetra commented Feb 26, 2024

Sorry for the long time without a response.
I have lost track of a bunch of things and unfortunately don't have enough time to finish this up.

Sorry again, but I'm closing this for now. Maybe I'll open it again in the future with a parser-level dispatch.

@aumetra aumetra closed this Feb 26, 2024
@aumetra aumetra deleted the runtime-detection branch May 24, 2024 10:38
@aumetra aumetra mentioned this pull request May 25, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

support CPU feature detection and dispatch in runtime
3 participants