-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman for nvidia may need --group-add keep-groups
#655
Comments
Lets just add it like @bmahabirbu 's #376 since this is the second reported case. Please open a PR and ensure it's called in all cases kublet, quadlet, run, serve, etc. |
I managed to make it work with just this. My gpu is old and doesn't have much ram, so I also had to pass
|
I think, there needs to be a way to pass |
--gpu-layers is ngl in llama.cpp cli |
Ah, right! This also works:
|
The keep-groups flag makes sense for rootless containers. But it would not work for docker backend, so we need to be careful. The gpu_args change I will leave up to @ericcurtin . |
I think we might need: "--gpu-layers", "16" but I'm not sure about changing 999 for llama.cpp, most of the time that's what you want, use the maximum amount of layers available. We have to be careful about merging defaults that "work best on my GPU". CLIs to override defaults are fine too. |
@khumarahn did you follow the nvidia cuda setup guide https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md. Setting up cuda with containers this way I no longer needed to pass Could also just be the environment you're using as well. |
I followed the guide, still need the groups. "--gpu-layers", "16"should not be hardcoded, that is just what worked for my gpu with a particular AI. Ideally it should be configurable as a ramalama command line option. The reason I wanted to change it is, with 999 layers ramalama would crash failing to allocate enough gpu memory |
Feel free to add a --ngl option to RamaLama @khumarahn |
for the --ngl bit try using |
I didn't find this in llama.cpp docs, and in my test -1 gpu layer only used 1gb of gpu ram |
Good to know saw it mentioned in some posts and thought it was still relevant. |
-1 doesn't work in llama.cpp , in llama.cpp 999 is use max layers. It may work in vllm I don't know. |
But my llama.cpp crashes with 999 layers with
|
I created a PR. It seemed pretty straightforward, but please check me... |
Hi! Ramalama is a very nice project.
I have some trouble making it use an nvidia gpu: on my system
/dev/dri/card*
are protected by the groupvideo
, and this privilege is given up by podman.This is similar to #376 and related to containers/podman#10166
The text was updated successfully, but these errors were encountered: