feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

PABannier · 2024-11-16T15:03:31Z

This PR implements the Metal kernel used for the GGML_UNARY_OP_ARGMAX operation.

It is necessary for Encodec.cpp to run on the Metal backend.

ggerganov · 2024-11-16T18:44:05Z

Thanks, will merge these soon. Just need to sync the ggerganov/llama.cpp#10238 from llama.cpp to avoid resolving conflicts manually.

src/ggml-metal/ggml-metal.m

PABannier · 2024-11-19T21:27:05Z

@ggerganov Thanks for the review. I added your changes:

Taking into account the strides to avoid asserting that src0 is contiguous,
Parallelizing over threadgroups instead of threads.

I have a follow-up question regarding the tests/CI: is the argmax Metal kernel tested? Looking at the logs I only found one place where argmax is mentioned: 19: ARGMAX(type=f32,ne=[10,100,1,1]): not supported [BLAS]. Am I missing something?

ggerganov · 2024-11-20T08:22:03Z

I have a follow-up question regarding the tests/CI: is the argmax Metal kernel tested? Looking at the logs I only found one place where argmax is mentioned: 19: ARGMAX(type=f32,ne=[10,100,1,1]): not supported [BLAS]. Am I missing something?

You have to enable the ARGMAX op in the Metal backend here:

https://github.com/ggerganov/llama.cpp/blob/3ee6382d48b07b31e64983969c16019490e19740/ggml/src/ggml-metal/ggml-metal.m#L949

Then you can test like this:

./bin/test-backend-ops -o ARGMAX -b Metal

src/ggml-metal/ggml-metal.metal

PABannier · 2024-11-20T18:01:44Z

Just tested the kernel with ./bin/test-backend-ops -o ARGMAX -b Metal and got:

  Device description: Apple M1 Pro
  Device memory: 10922 MB (10916 MB free)

  ARGMAX(type=f32,ne=[10,100,1,1]): OK
  1914/1914 tests passed
  Backend Metal: OK

ggml_metal_free: deallocating
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

src/ggml-metal/ggml-metal.metal

PABannier · 2024-11-29T16:52:11Z

@ggerganov I pushed a SIMD implementation of the kernel. All tests are passing :)

I'd like to benchmark both the non-SIMD and SIMD implementations of the kernel. Is there an existing snippet of code that benchmarks the latency and throughput of the kernel as @slaren did in ggerganov/llama.cpp#10441 ?

slaren · 2024-11-29T16:55:48Z

You can obtain the performance measurements with test-backend-ops -o ARGMAX perf. The same tests that I used should already be there, but you can add your own to make_test_cases_perf.

PABannier · 2024-11-29T19:23:51Z

Thanks @slaren ! I run test-backend-ops perf -o ARGMAX -b Metal after adding the 3 test cases in make_test_cases_perf.

Here are the results:

With SIMD vectorization:

ARGMAX(type=f32,ne=[100,10,1,1]):        286720 runs -     3.53 us/run -        3 kB/run -    1.07 GB/s
ARGMAX(type=f32,ne=[1024,12,1,1]):         90112 runs -    11.38 us/run -       48 kB/run -    4.03 GB/s
ARGMAX(type=f32,ne=[5438,3,1,1]):         65536 runs -    16.11 us/run -       63 kB/run -    3.77 GB/s

Without SIMD vectorization:

ARGMAX(type=f32,ne=[100,10,1,1]):         73728 runs -    14.75 us/run -        3 kB/run -    0.26 GB/s
ARGMAX(type=f32,ne=[1024,12,1,1]):       16384 runs -   120.66 us/run -       48 kB/run -    0.38 GB/s
ARGMAX(type=f32,ne=[5438,3,1,1]):          8192 runs -   626.23 us/run -       63 kB/run -    0.10 GB/s

This is indeed much faster! Thanks for the suggestion @slaren !

It's ok on my side if you want to merge :)

tests/test-backend-ops.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* implemented argmax kernel * tpig -> tgpig * change to strides * contiguous assertions * kernel working and tested * argmax simd parallel implementation * added 2 new tests for argmax in test-backend-ops * cosmit * added 3 tests cases for perf eval * add test_argmax in make_test_cases_perf * Update test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

implemented argmax kernel

71487a6

ggerganov requested changes Nov 18, 2024

View reviewed changes

src/ggml-metal/ggml-metal.m Outdated Show resolved Hide resolved

PABannier added 2 commits November 19, 2024 22:09

tpig -> tgpig

630d65d

change to strides

093f087

ggerganov approved these changes Nov 20, 2024

View reviewed changes

src/ggml-metal/ggml-metal.metal Show resolved Hide resolved

PABannier added 2 commits November 20, 2024 18:49

contiguous assertions

9cf977d

kernel working and tested

d782d29

slaren reviewed Nov 20, 2024

View reviewed changes

src/ggml-metal/ggml-metal.metal Outdated Show resolved Hide resolved

PABannier added 3 commits November 29, 2024 17:46

argmax simd parallel implementation

7e30802

added 2 new tests for argmax in test-backend-ops

09bec3e

cosmit

ff2faa3

PABannier added 2 commits November 29, 2024 20:14

added 3 tests cases for perf eval

66460e0

add test_argmax in make_test_cases_perf

98d7a51

slaren reviewed Nov 29, 2024

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

Update test-backend-ops.cpp

6f3cf62

Co-authored-by: Diego Devesa <slarengh@gmail.com>

slaren approved these changes Nov 29, 2024

View reviewed changes

slaren merged commit 589fed1 into ggerganov:master Dec 2, 2024
4 checks passed

PABannier deleted the argmax_metal_kernel branch December 3, 2024 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

PABannier commented Nov 16, 2024

ggerganov commented Nov 16, 2024

PABannier commented Nov 19, 2024

ggerganov commented Nov 20, 2024

PABannier commented Nov 20, 2024

PABannier commented Nov 29, 2024

slaren commented Nov 29, 2024

PABannier commented Nov 29, 2024 •

edited

Loading

feat: add GGML_UNARY_OP_ARGMAX Metal kernel #1019

feat: add GGML_UNARY_OP_ARGMAX Metal kernel #1019

Conversation

PABannier commented Nov 16, 2024

ggerganov commented Nov 16, 2024

PABannier commented Nov 19, 2024

ggerganov commented Nov 20, 2024

PABannier commented Nov 20, 2024

PABannier commented Nov 29, 2024

slaren commented Nov 29, 2024

PABannier commented Nov 29, 2024 • edited Loading

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

PABannier commented Nov 29, 2024 •

edited

Loading