Bug: GGML_SCHED_MAX_SPLITS must be increased to run BigLlama-3.1-681B-Instruct using GPU acceleration #9044
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
What happened?
When running inference on
BigLlama-3.1-681B-Instruct
using GPU acceleration llama.cpp crashed withGGML_ASSERT(i_split < GGML_SCHED_MAX_SPLITS) failed
. CPU inference works without any issues. This issue occurs no matter what GPU backend is used or if any layers are offloaded to the GPU.Increasing
GGML_SCHED_MAX_SPLITS
to 4096 fixed this crash and made GPU accelerated inference work without any issues: https://github.com/ggerganov/llama.cpp/blob/4b9afbbe9037f8a2d659097c0c7d9fce32c6494c/ggml/src/ggml-backend.c#L1022Issue of GGML_SCHED_MAX_SPLITS being a compile-time constant
Having GGML_SCHED_MAX_SPLITS as a compile-time constant is problematic as changing it requires recompiling llama.cpp from source. While this is relatively easy if you use llama.cpp directly as soon you deal with 3rd party software using backend specific pre-built llama-cpp-python bindings (like oobabooga/text-generation-webui) changing GGML_SCHED_MAX_SPLITS is unfeasible for the general user.
Possible solutions
Evaluation of possible solutions
I believe determining the optimal value of GGML_SCHED_MAX_SPLITS value at runtime based on the model instructed to load to be simple. Under the following location we store the actual amounts of splits into
sched->n_splits
:https://github.com/ggerganov/llama.cpp/blob/4b9afbbe9037f8a2d659097c0c7d9fce32c6494c/ggml/src/ggml-backend.c#L1618
Here the only place where GGML_SCHED_MAX_SPLITS is used outside the assert and some disabled debug code:
https://github.com/ggerganov/llama.cpp/blob/4b9afbbe9037f8a2d659097c0c7d9fce32c6494c/ggml/src/ggml-backend.c#L1868-L1875
Proposed solution
The assert could be removed and GGML_SCHED_MAX_SPLITS be replaced with
max(2048, sched->n_splits)
at ggml-backend.c#L1868 and ggml-backend.c#L1874 to resolve this issue.Name and Version
version: 3590 (4b9afbb)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: