-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels #9217
Merged
Merged
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
remove prints from the low-level code
- Loading branch information
commit 7276e2b31c47d4d3776851cee2397ae3e920d459
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if these function calls are properly inlined. AFAIK with LTO enabled, they should be, but maybe it's better if instead of relying on the
compilationlinker to do it for us, we can read the value once into a static variable and check that variable from then on.The function call overhead is probably negligible, but still, since we are in a hot loop, it might make a differnce. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Thanks for the review and sorry for the late response as I was on vacation. I'll look into your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible way to reduce the overhead of function calls such as ggml_cpu_has_neon() is to cache the results using static variables, as you suggested:
To prevent race conditions, the values of neon_support_checked and neon_supported need to be protected by a mutex or atomic operations:
By marking is_neon_supported() as inline, we may reduce the function call overhead. However, since the function still contains a mutex, the overall performance improvement could be limited due to the cost of acquiring and releasing the lock.
Another possible optimization might be to inline the ggml_cpu_has_neon() function itself, although that might be beyond the scope of this PR.
I'm interested to hear what you think, and I'm happy to consider any suggestions or improvements you may have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, mutex is an overkill. Let's merge it as it is. You can easily check if the function calls add any overhead by replacing with
if (true)
. If they do, you can try to find a way to do thread-safe static init without synchronization primitives.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve done some overhead tests for the function calls and didn’t observe any statistically significant performance differences. Given that, I’m fine with merging the code as is after I rebase it. Please let me know if you think I should do anything differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's merge after rebase.