-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Thumb-2 optimized memcpy/memset #67
Comments
lk implements arm-m optimized memcpy and memset routines in git commit littlekernel/lk@33b94d9 |
@jserv The profile result: |
It looks so weird. Can you explain? |
@jserv The implementation is the branch. My approach is that measure the case, alignment and unalignment, five times and take the avg time. Assume my approach is correct, the data imply the conclusion is the unalignment case is better than alignment after the optimized on the stm32F407. |
@gapry In order to clarify the performance gain, please compare the optimized memcpy routines with plain byte-oriented C version. |
@jserv What does plain byte-oriented mean ? |
The simplest and inefficient implementation of
|
@jserv For now, I use DWT to measure the elapsed clock cycles. You can check the commit: https://github.com/gapry/f9-kernel/commit/33e58dfcb1105140365132269c596763531e9ede and the completed Implementation: https://github.com/gapry/f9-kernel/blob/benchmark_memcpy/benchmark/benchmark.c |
@gapry I don't think your benchmarking is valid since it doesn't represent the variance. There must be something wrong. |
Directory
kernel/lib
contains the implementation ofmemcpy
andmemset
, but it is too generic. We can utilize several ARM Cortex-M3/M4 specific features to optimize:The text was updated successfully, but these errors were encountered: