Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect use of ctx_split for bias tensors #9063

Merged
merged 1 commit into from
Aug 17, 2024

Conversation

suhara
Copy link
Contributor

@suhara suhara commented Aug 17, 2024

Creating a separate PR for incorrect use of ctx_split for bias tensors, following the suggestions by @slaren in in #8922 .

Please see this message for details.

Citing @slaren's comment

ctx_split only makes a difference when using tensor parallelism with -sm row, which is only supported on the CUDA backend when using multiple GPUs. When using -sm row, ctx_split splits the rows of the matrix between the available GPUs. This is only supported for matrix multiplication, so it should only be used with the matrix portion of linear/dense layers. The other cases are also wrong and should be corrected as well, but it doesn't need to be done here.

As far as I see, there are four such lines, which have been fixed in this PR.

Special thanks to @slaren !

Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@slaren slaren merged commit 2fb9267 into ggerganov:master Aug 17, 2024
51 of 52 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants