You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking at the AOCL memcpy/memmov functions and I think I found a perf bug in the memmov function.
In the memmove function there is a check for aligned src and dst so we can use the vmovaps for aligned load/store. But the check is comparing the whole src pointer with the low bits of dst.
Compare that to the check in memcpy that looks at the low bits of both.
if ((((size_t)src & (YMM_SZ - 1)) | dst_align) == 0)
I think this is going to prevent the aligned path from being taken in memmov. I tried changing it to match the check in memcpy, but I did not see much difference in perf in the micro-benchmarks. Probably worth fixing still though.
The text was updated successfully, but these errors were encountered:
I am looking at the AOCL memcpy/memmov functions and I think I found a perf bug in the memmov function.
In the memmove function there is a check for aligned src and dst so we can use the
vmovaps
for aligned load/store. But the check is comparing the whole src pointer with the low bits of dst.Compare that to the check in memcpy that looks at the low bits of both.
I think this is going to prevent the aligned path from being taken in memmov. I tried changing it to match the check in memcpy, but I did not see much difference in perf in the micro-benchmarks. Probably worth fixing still though.
The text was updated successfully, but these errors were encountered: