Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

Merged
merged 2 commits into from
Jun 26, 2024
Merged

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag #8115

merged 2 commits into from
Jun 26, 2024

Conversation

isaac-mcfadyen
Copy link
Contributor

Summary

  • In CUDA: use MMQ instead of cuBLAS by default #8075, MMQ was enabled by default on GPUs with int8 tensor core support.
  • A short description of the LLAMA_CUDA_FORCE_MMQ was added to the README. As it currently stands though, the message makes it seem like MMQ will not be used unless enabled with the flag.
  • The message says "flag forces MMQ to be enabled on GPUs without int8 support" but doesn't really say that it will be enabled by default on GPUs with int8 support.
  • This PR just adds a short blurb stating that MMQ is enabled by default on GPUs with int8 tensor core support, and that the flag forces it for all GPUs.

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 25, 2024
@slaren slaren requested a review from JohannesGaessler June 25, 2024 20:13
README.md Outdated Show resolved Hide resolved
Co-authored-by: Johannes Gäßler <[email protected]>
@JohannesGaessler JohannesGaessler merged commit 8854044 into ggerganov:master Jun 26, 2024
7 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2024
…#8115)

* Add message about int8 support

* Add suggestions from review

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2024
…#8115)

* Add message about int8 support

* Add suggestions from review

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
…#8115)

* Add message about int8 support

* Add suggestions from review

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
…#8115)

* Add message about int8 support

* Add suggestions from review

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants