-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add GGML_UNARY_OP_ARGMAX
Metal kernel
#1019
Conversation
Thanks, will merge these soon. Just need to sync the ggerganov/llama.cpp#10238 from |
@ggerganov Thanks for the review. I added your changes:
I have a follow-up question regarding the tests/CI: is the |
You have to enable the ARGMAX op in the Metal backend here: Then you can test like this: ./bin/test-backend-ops -o ARGMAX -b Metal |
Just tested the kernel with
|
@ggerganov I pushed a SIMD implementation of the kernel. All tests are passing :) I'd like to benchmark both the non-SIMD and SIMD implementations of the kernel. Is there an existing snippet of code that benchmarks the latency and throughput of the kernel as @slaren did in ggerganov/llama.cpp#10441 ? |
You can obtain the performance measurements with |
Thanks @slaren ! I run Here are the results: With SIMD vectorization:
Without SIMD vectorization:
This is indeed much faster! Thanks for the suggestion @slaren ! It's ok on my side if you want to merge :) |
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* implemented argmax kernel * tpig -> tgpig * change to strides * contiguous assertions * kernel working and tested * argmax simd parallel implementation * added 2 new tests for argmax in test-backend-ops * cosmit * added 3 tests cases for perf eval * add test_argmax in make_test_cases_perf * Update test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>
This PR implements the Metal kernel used for the
GGML_UNARY_OP_ARGMAX
operation.It is necessary for Encodec.cpp to run on the Metal backend.