You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is clear that all Lemire methods suffer from some kind of codegen issue.
Pausing on breakpoint in the Lemire gives following disassembly for: auto const r = wuint::umul128(n, UINT64_C(1844674407370955162));
shows that wuint::umul128 is not inlined.
Adding [[clang::always_inline]] to uint128 umul128(std::uint64_t x, std::uint64_t y) noexcept { produces following results:
All results were obtained on AMD Ryzen 4 7900 running Win10 Pro and VS2022 community with ClangCL v 17.0.3.
I expect that chasing such compiler differences is not really useful and this feedback can be resolved as not a defect immediately. I found it interesting enough to share.
The text was updated successfully, but these errors were encountered:
I tried to compile the code with my ClangCL with the -march=znver4 option, and it seems it actually optimizes things just fine. For instance, the following is the codegen for alg64::lemire_branchless:
Default ClangCl release config produces following bench result for 64 bits:
It is clear that all Lemire methods suffer from some kind of codegen issue.
Pausing on breakpoint in the Lemire gives following disassembly for:
auto const r = wuint::umul128(n, UINT64_C(1844674407370955162));
shows that
wuint::umul128
is not inlined.Adding
[[clang::always_inline]]
touint128 umul128(std::uint64_t x, std::uint64_t y) noexcept {
produces following results:The same code now produces disassembly:
Which is fine by itself but probably was not desired if goal was to get MULX extension used.
By adding
target_compile_options(rtz_benchmark_exe PRIVATE "/arch:AVX2")
in Cmake we can get the MULXwith disassembly:
All results were obtained on AMD Ryzen 4 7900 running Win10 Pro and VS2022 community with ClangCL v 17.0.3.
I expect that chasing such compiler differences is not really useful and this feedback can be resolved as not a defect immediately. I found it interesting enough to share.
The text was updated successfully, but these errors were encountered: