-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XXH128 #29
Comments
xxh128 and also SIMD for xxh64 would certainly be interesting. Unfortunately, Safari still doesn't support WASM's SIMD, and I think that all major browsers should be supported without any feature detection (with https://github.com/GoogleChromeLabs/wasm-feature-detect) because that would require shipping multiple WASM binaries. |
@jungomi I'm looking forward to this library using SIMD instructions. IMO it might be worth implementing that already even if Safari doesn't support those yet, there are use cases where Safari isn't a target engine anyway (Node-only or Electron-only apps for example). |
Yes, absolutely. But I personally don't really have the time right now to look into the WebAssembly SIMD and to port xxh3 implementation to WebAssembly. I'm of course open for any contributions, but as long as no one is in dire need for it (more likely the xxh128 than just the improvements from SIMD) or just eager to implement it, I wouldn't expect it to happen immediately. |
I've gone through an initial implementation of one-shot SIMD 64b xxh3 for performance validation purposes and unfortunately the results aren't particularly promising. Node 19.7 on macOS/aarch64: (I don't currently own an x64 machine so I can't validate the performance differences that exist on the v8 SSE implementation of WASM SIMD—I could publish my branch if somebody else would like to check)
With larger input sizes demonstrating the same pattern as the 10k benchmark with linear decreases in speed. It's also worth mentioning that node 19.7 appears to have a sizable performance improvement over 18.x on bigint handling which leads to xxh64 matching the low-input-size performance of xxh32. I have outstanding work to do to try to improve the performance of the xxh3 implementation on input byte sizes >240 by 1. unrolling some loops and 2. moving accumulators from memory to locals. That said, the lagging 100b benchmark would be unaffected by these improvements. The primary operation in use on that 100b case (and a fairly critical one in general for xxh3) is a u64 multiply yielding a u128, which isn't well supported by the WASM instruction set. Originally I ported the xxHash scalar implementation of that operation directly to WASM, but after seeing these results I attempted a vectorized version myself. Unfortunately that vectorized implementation didn't appear to provide any performance advantage. I'm very new to SIMD (which is part of why I attempted this implementation) so I'll attach that function directly here in case anybody has a suggestion for improvement. Algorithm reference: https://github.com/Cyan4973/xxHash/blob/dev/xxhash.h#L3784
Edit/Update: Manually unrolling loops does not appear to materially alter performance. |
As a follow on, I've pushed my work-in-progress branch: https://github.com/marcusdarmstrong/xxhash-wasm/tree/xxh3 Beyond the above discussion, the branch contains two versions of 64b xxh3, a more orderly port and the That additional optimization work was able to drive aarch64 performance into being a win or flat at almost every benchmark, however, I was able to run a test on an old x64 system and the results were disheartening. Benchmark results from node 16.15 (as this is the latest version of node the x64 machine supports)
x64:
As a result of these numbers my intent would be to not bother trying to get this into a publishable state, as I don't see much of a path to further optimizations—but if anybody else sees this and wants to give it a shot, feel free to reach out and chat. |
I don't think testing with node 16.15 is going to be very indicative... |
I could try running it on a EPYC machine. But I don't see any "bench" script in the package.json. How do I run the benchmarks? |
The |
|
wasm-opt was missing in deps for building |
Mixed resuts:
|
I suspect SIMD overhead is too large for inputs smaller than 64 bytes. |
SIMD is only really in use for inputs > 240b. The underlying driver of performance difficulties at smaller input byte sizes is that wasm doesn't have a 64b*64b=>128b mul operation, while the C version of xxhash is able to use intrinsics to access that operation on x64 and aarch64. |
Checking in here, there has been spec progress on the blocker here. There's now a wide-arithmetic spec in phase 2 (with reference implementations complete) that includes the critical |
Wasm supports simd. Could we also have a xxh128 variant?
The text was updated successfully, but these errors were encountered: