-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] fallback mmvq #9088
[SYCL] fallback mmvq #9088
Conversation
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
56581db
to
7c332dc
Compare
I am getting performance regression with this PR ( from 35 tokens/sec to 4 tokens/sec on Arc A750 from the latest master branch) Reverting this fixes the issue. Subscribing myself just in case for checking out later. |
@qnixsynapse what the command you are using? |
This:
I haven't fully tested it yet, just identified the offending patch and reverted. |
This is still causing performance regression on quantized models (esp iq4_xs) here.
Can you please elaborate on why this is necessary? If Nvidia is unaffected by it, we can make it Nvidia exclusive. Here is what I thought of doing: bool use_mul_mat_vec_q = ggml_is_quantized(src0->type)
&& src1->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F32
&& src1->ne[1] <= MMVQ_MAX_BATCH_SIZE;
if (ctx.stream()->get_backend() == sycl::backend::ext_oneapi_cuda) {
use_mul_mat_vec_q = use_mul_mat_vec_q && (src1->ne[1] > MMVQ_MIN_BATCH_SIZE);
} |
Update: With this PR, the boolean variable, "use_mul_mat_vec_q" is always false: But reverting this PR, the value changes to true sometimes: cc : @NeoZhangJianyu Please look at this. |
@qnixsynapse sorry for the carelessness, could you open a PR to revert it? |
This reverts commit 50addec.
@airMeng Sure, no problem. |
This reverts commit 50addec.
* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>
This reverts commit 50addec.
* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>
This reverts commit 50addec.
There is a bug in SYCL MMVQ implementation, stuck when evaluate accuracy. fallback mmvq to avoid the stuck