Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

isaac-mcfadyen · 2024-06-25T16:58:55Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Summary

In CUDA: use MMQ instead of cuBLAS by default #8075, MMQ was enabled by default on GPUs with int8 tensor core support.
A short description of the LLAMA_CUDA_FORCE_MMQ was added to the README. As it currently stands though, the message makes it seem like MMQ will not be used unless enabled with the flag.
The message says "flag forces MMQ to be enabled on GPUs without int8 support" but doesn't really say that it will be enabled by default on GPUs with int8 support.
This PR just adds a short blurb stating that MMQ is enabled by default on GPUs with int8 tensor core support, and that the flag forces it for all GPUs.

README.md

Co-authored-by: Johannes Gäßler <[email protected]>

…#8115) * Add message about int8 support * Add suggestions from review Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

Add message about int8 support

b6cd699

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 25, 2024

slaren approved these changes Jun 25, 2024

View reviewed changes

slaren requested a review from JohannesGaessler June 25, 2024 20:13

JohannesGaessler requested changes Jun 25, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

Add suggestions from review

37ff709

Co-authored-by: Johannes Gäßler <[email protected]>

JohannesGaessler approved these changes Jun 26, 2024

View reviewed changes

JohannesGaessler merged commit 8854044 into ggerganov:master Jun 26, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

isaac-mcfadyen commented Jun 25, 2024

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

Conversation

isaac-mcfadyen commented Jun 25, 2024

Summary