-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WebAssembly SIMD Support #269
base: master
Are you sure you want to change the base?
Conversation
Chrome 91 just released yesterday with stable support for WASM SIMD. Firefox will shortly follow as well. Rust also intends to stabilize the intrinsics soon. So I decided to go ahead and already port SIMD support for a couple of crates including now hashbrown.
I'm curious if this actually improves performance on WASM compared to the generic version. One concern I have is that Regarding the stability of the intrinsics, I'm happy to add this under the nightly feature. |
Seems like this is what it lowers to: https://gcc.godbolt.org/z/7chfTPsjb Also same lowering seen here in V8: arm64/code-generator-arm64.cc There's some benchmarks here: Wasm SIMD bitmask slides Though they are mostly comparing against expressing it with other WASM SIMD instructions, whereas in hashbrown we would just avoid SIMD entirely on WASM otherwise. There's more discussion and some benchmarks in the original PR for adding the instruction to WASM: WebAssembly/simd#201 I can't really benchmark it myself on an Aarch64 device (Safari doesn't yet support WASM SIMD, and the iPhone is the only Aarch64 device with some sort of WASM support that I have). However I also noticed a few things:
|
I did some comparisons on my local macbook (x86_64) and an arm64 machine I had access to using Wasmtime. I don't think Wasmtime's simd backend has seen a ton of optimization yet, but this may at least be somewhat representative. I compared x86_64 - baseline vs baseline-simd128 - mixed bag
x86_64 - baseline vs everything - clear win
aarch64 - baseline vs baseline-simd128 - mostly a loss
aarch64 - baseline vs everything - huge loss
It was mostly just easiest to collect numbers with Wasmtime, I don't know how to easily collect numbers with v8 and/or Spidermonkey which would probably have different results on AArch64 |
@alexcrichton Thanks for benchmarking this. I also opened #271 which switches the groups not to v128, but from u32 to u64, so that possibly should itself be a decently large win on both architectures. So if we get some numbers for that PR maybe we can close this one in favor of that. (Although that one is orthogonal to this one anyway) |
This is unfortunately a situation where the underlying architecture leaks out: we can either optimize for x86 or arm, but not both. The core of the issue is that the x86 A few years ago I attempted to implement NEON support for hashbrown (https://github.com/Amanieu/hashbrown/blob/neon/src/raw/neon.rs) but the results were about the same as the generic integer version so I dropped it. I wonder how an equivalent algorithm would perform in wasm for x86 and aarch64. In the end I'm not sure what the best approach to use here is. We can't optimize for a specific target architecture since wasm specifically hides this from us. |
If interested, the best thing to do here would probably be to replicate these results on a likely-more-battle-tested-and-production-ready-simd-engine, aka v8 or Spidermonkey. After that if it's still an issue opening an issue on https://github.com/WebAssembly/simd would be the way to go most likely. |
☔ The latest upstream changes (presumably #430) made this pull request unmergeable. Please resolve the merge conflicts. |
Chrome 91 just released yesterday with stable support for WASM SIMD. Firefox will shortly follow as well. Rust also intends to stabilize the intrinsics soon. So I decided to go ahead and already port SIMD support for a couple of crates including now hashbrown.
This can't be merged until the WebAssembly intrinsics are stabilized. Technically we could merge it early under the nightly feature flag, but I'd expect the intrinsics to change a bit in the near future before stabilizing, so merging early might not make much sense.