-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various Fixes and enhancements in x86 intrinsics #1594
Conversation
7387776
to
f66cec7
Compare
79cca5a
to
df30a0c
Compare
Updated the intrinsics list from version 3.4 to 3.6.8. Added a missing-x86.md file to track progress.
fixed reduce-add and reduce-mul. and load/store of mask32 and mask64. added preserves-flags to mov asm. fixed the missing list. fixed `_mm_loadu_si64`. Added `assert_instr`
Added some tests, Fixed incorrect target-features, and verification code for target-features. Removed all MMX support from verification.
`_mm512_kunpackb` was implemented wrong, and `simd_reduce_max` uses `maxnum` for comparison, which adheres to IEEE754, but Intel specifically says that they do NOT adhere to IEEE754 for NaNs, which can give wrong results
231b968
to
1c7aafe
Compare
2be8efe
to
a58f1ee
Compare
/// must be aligned on a 32-byte boundary or a general-protection exception may be generated. To | ||
/// minimize caching, the data is flagged as non-temporal (unlikely to be used again soon) | ||
/// | ||
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_stream_load_si256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and all other AVX2 non-temporal operations) should get the same safety comment that the older non-temporal stores have. See e.g. here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and only non-temporal stores have special memory orderings on x86. x86 non-temporal loads work just like normal loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Amanieu told that that doesn't apply to streaming loads, only streaming stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize non-temporal loads even are a thing. More nightmare waiting to happen, I guess...
missing-x86.md
for ease of implementationobjdump
from binutils, and binutils is a dependency of GCC). Add thex86_64-pc-windows-gnu
target in CIsimd_reduce_add_unordered
andsimd_reduce_mul_unordered
as Intel specifies a strict associativity. Follow GCC and hand-implement the associativity ourselves (_mm512_reduce_add_ps and friends are setting fast-math flags they should not set #1533)_load_mask32
etc in AVX512BW (they should have taken a__mmask32
/__mmask64
pointer, but tooku32
/u64
pointer)preserves_flags
to theasm!
blocks for moves_mm_loadu_si64
(it had target-feature sse, but needs sse2),_mm256_extract_epi64
,_mm256_extract_epi32
,_mm256_cvtsi256_si32
(these had target-feature avx2, but need avx)._mm_cvtt
intrinsics (they were actually calling vcvtss2si, when they should call vcvttss2si)simd-x86-updates
(Tracking Issue for Missing BMI1, AVX2, SSE2, SSE4.1, SSE4a and TBM intrinsics rust#126936)_mm512_kunpackb
maxnum
from LLVM)armv7-unknown-linux-gnueabihf
andx86_64-unknown-linux-gnu-emulated
)Modifying fma has been moved to #1597
Masked load/stores are on standby due to rust-lang/rust#126919