[SYCL] fallback mmvq #9088

airMeng · 2024-08-19T05:44:25Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

There is a bug in SYCL MMVQ implementation, stuck when evaluate accuracy. fallback mmvq to avoid the stuck

./bin/llama-perplexity  --hellaswag -m ~/Meta-Llama-3-8B-Instruct-Q4_K_S.gguf -s 0 -ngl 99 --hellaswag-tasks 40 -f ../hellaswag_val_full.txt

ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <[email protected]>

qnixsynapse · 2024-08-23T05:20:16Z

I am getting performance regression with this PR ( from 35 tokens/sec to 4 tokens/sec on Arc A750 from the latest master branch) Reverting this fixes the issue. Subscribing myself just in case for checking out later.

airMeng · 2024-08-23T05:29:38Z

I am getting performance regression with this PR ( from 35 tokens/sec to 4 tokens/sec on Arc A750 from the latest master branch) Reverting this fixes the issue. Subscribing myself just in case for checking out later.

@qnixsynapse what the command you are using?

qnixsynapse · 2024-08-23T05:32:39Z

This:

build/bin/llama-server -c 8192 -t 4 -m meta-llama-3.1-8b-instruct-iq4_xs-imat.gguf --no-mmap -ngl 99 -a "LLaMA" -b 128 --port 8080 --log-disable -s 0 -mg 0 -sm none --log-format text -dt 0.1 --metrics

I haven't fully tested it yet, just identified the offending patch and reverted.

qnixsynapse · 2024-09-16T07:33:00Z

This is still causing performance regression on quantized models (esp iq4_xs) here.

stuck when evaluate accuracy. fallback mmvq to avoid the stuck

Can you please elaborate on why this is necessary? If Nvidia is unaffected by it, we can make it Nvidia exclusive.

Here is what I thought of doing:

bool use_mul_mat_vec_q =  ggml_is_quantized(src0->type)
       && src1->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F32
       && src1->ne[1] <= MMVQ_MAX_BATCH_SIZE;
         
   
 if (ctx.stream()->get_backend() == sycl::backend::ext_oneapi_cuda) {
            use_mul_mat_vec_q = use_mul_mat_vec_q && (src1->ne[1] > MMVQ_MIN_BATCH_SIZE);
  }

qnixsynapse · 2024-09-17T15:04:59Z

Update: With this PR, the boolean variable, "use_mul_mat_vec_q" is always false:

But reverting this PR, the value changes to true sometimes:

cc : @NeoZhangJianyu Please look at this.

airMeng · 2024-09-21T14:08:06Z

@qnixsynapse sorry for the carelessness, could you open a PR to revert it?

This reverts commit 50addec.

qnixsynapse · 2024-09-21T14:35:57Z

@airMeng Sure, no problem.

This reverts commit 50addec.

* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>

This reverts commit 50addec.

* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>

This reverts commit 50addec.

airMeng requested a review from joeatodd August 19, 2024 05:44

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 19, 2024

luoyu-intel approved these changes Aug 19, 2024

View reviewed changes

Alcpz reviewed Aug 19, 2024

View reviewed changes

ggml/src/ggml-sycl.cpp Outdated Show resolved Hide resolved

airMeng requested a review from NeoZhangJianyu August 20, 2024 03:47

airMeng and others added 3 commits August 20, 2024 06:45

fallback mmvq to mul_mat

5b6d224

mmvq in cuda path

299412e

Update ggml/src/ggml-sycl.cpp

7c332dc

Co-authored-by: Alberto Cabrera Pérez <[email protected]>

airMeng force-pushed the sycl-fallback-mmvq branch from 56581db to 7c332dc Compare August 20, 2024 06:45

Alcpz approved these changes Aug 20, 2024

View reviewed changes

NeoZhangJianyu approved these changes Aug 20, 2024

View reviewed changes

NeoZhangJianyu merged commit 50addec into master Aug 20, 2024
55 checks passed

qnixsynapse mentioned this pull request Sep 16, 2024

Bug: Lower performance in SYCL vs IPEX LLM. #9505

Closed

airMeng deleted the sycl-fallback-mmvq branch September 21, 2024 14:06

qnixsynapse pushed a commit to qnixsynapse/llama.cpp that referenced this pull request Sep 21, 2024

Revert "[SYCL] fallback mmvq (ggerganov#9088)"

33b6929

This reverts commit 50addec.

qnixsynapse mentioned this pull request Sep 21, 2024

Revert "[SYCL] fallback mmvq" #9579

Merged

airMeng pushed a commit that referenced this pull request Sep 23, 2024

Revert "[SYCL] fallback mmvq (#9088)" (#9579)

e62e978

This reverts commit 50addec.

airMeng mentioned this pull request Oct 10, 2024

Bug: [SYCL] crash since b-3805 #9612

Closed

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

Revert "[SYCL] fallback mmvq (ggerganov#9088)" (ggerganov#9579)

b1eae97

This reverts commit 50addec.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

Revert "[SYCL] fallback mmvq (ggerganov#9088)" (ggerganov#9579)

be3fc48

This reverts commit 50addec.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

Revert "[SYCL] fallback mmvq (ggerganov#9088)" (ggerganov#9579)

45b9f19

This reverts commit 50addec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] fallback mmvq #9088

[SYCL] fallback mmvq #9088

airMeng commented Aug 19, 2024

qnixsynapse commented Aug 23, 2024

airMeng commented Aug 23, 2024

qnixsynapse commented Aug 23, 2024

qnixsynapse commented Sep 16, 2024 •

edited

Loading

qnixsynapse commented Sep 17, 2024

airMeng commented Sep 21, 2024

qnixsynapse commented Sep 21, 2024

[SYCL] fallback mmvq #9088

[SYCL] fallback mmvq #9088

Conversation

airMeng commented Aug 19, 2024

qnixsynapse commented Aug 23, 2024

airMeng commented Aug 23, 2024

qnixsynapse commented Aug 23, 2024

qnixsynapse commented Sep 16, 2024 • edited Loading

qnixsynapse commented Sep 17, 2024

airMeng commented Sep 21, 2024

qnixsynapse commented Sep 21, 2024

qnixsynapse commented Sep 16, 2024 •

edited

Loading