-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Runner crash in Ollama when offloading multiple layers #12513
Comments
Hi @pauleseifert. I think this should be an OOM issue, you may try to set |
Hi @sgwhat. I agree, that's what it looks like. ENV |
|
This doesn't help. The runner still crashes. intel_gpu_top showed normal behavior for the short moment the runner was visible. There are no other processes running so all memory should be available. |
Can you provide the memory usage before and after running |
Hi,
I experience crashes of the gpu runner when offloading multiple layers to the gpu.
It seems to work for one layer. The error message is not really helpful. The GPU is small (4gb A310) but so is the model (llama [email protected] params., 1.87 GiB model size). VRAM shouldn't be the problem.
I use docker on Debian on kernel 6.6.44 with the following docker compose:
Any ideas for further debugging? Full logs below.
The text was updated successfully, but these errors were encountered: