Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memmov perf bug #2

Open
dmpots opened this issue Nov 26, 2024 · 0 comments
Open

memmov perf bug #2

dmpots opened this issue Nov 26, 2024 · 0 comments

Comments

@dmpots
Copy link

dmpots commented Nov 26, 2024

I am looking at the AOCL memcpy/memmov functions and I think I found a perf bug in the memmov function.
In the memmove function there is a check for aligned src and dst so we can use the vmovaps for aligned load/store. But the check is comparing the whole src pointer with the low bits of dst.

uint32_t dst_align = ((size_t)dst & (YMM_SZ - 1));
...
if (!((uintptr_t)src ^ dst_align))

Compare that to the check in memcpy that looks at the low bits of both.

if ((((size_t)src & (YMM_SZ - 1)) | dst_align) == 0)

I think this is going to prevent the aligned path from being taken in memmov. I tried changing it to match the check in memcpy, but I did not see much difference in perf in the micro-benchmarks. Probably worth fixing still though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant