Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : dynamic ggml_sched_max_splits based on graph_size #9047

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

nicoboss
Copy link
Contributor

This fixes #9044

Sets ggml_sched_max_splits to be equal to graph_size as recommended by @slaren in #9044 (comment) since at most there is one split for each node in the graph.

Thanks to this change I was able to run GPU accelerated inference on BigLlama-3.1-681B-Instruct which prior to this change caused llama.cpp to crash.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 15, 2024
ggml/src/ggml-backend.c Outdated Show resolved Hide resolved
@slaren slaren merged commit e3f6fd5 into ggerganov:master Aug 16, 2024
51 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: GGML_SCHED_MAX_SPLITS must be increased to run BigLlama-3.1-681B-Instruct using GPU acceleration
2 participants