podman for nvidia may need `--group-add keep-groups` #655

khumarahn · 2025-01-29T13:27:59Z

Hi! Ramalama is a very nice project.

I have some trouble making it use an nvidia gpu: on my system /dev/dri/card* are protected by the group video, and this privilege is given up by podman.

$ ls -alh /dev/dri
total 0
drwxr-xr-x   3 root root        140 Jan 28 11:15 .
drwxr-xr-x  21 root root       4.5K Jan 28 13:16 ..
drwxr-xr-x   2 root root        120 Jan 28 11:15 by-path
crw-rw----+  1 root video  226,   0 Jan 28 11:15 card0
crw-rw----+  1 root video  226,   1 Jan 28 11:15 card1
crw-rw-rw-   1 root render 226, 128 Jan 28 11:15 renderD128
crw-rw-rw-   1 root render 226, 129 Jan 28 11:15 renderD129

This is similar to #376 and related to containers/podman#10166

The text was updated successfully, but these errors were encountered:

ericcurtin · 2025-01-29T14:03:17Z

Lets just add it like @bmahabirbu 's #376 since this is the second reported case.

Please open a PR and ensure it's called in all cases kublet, quadlet, run, serve, etc.

khumarahn · 2025-01-29T14:20:53Z

I managed to make it work with just this. My gpu is old and doesn't have much ram, so I also had to pass --gpu-layers to llama.cpp, which I did in the least universal way I saw

diff --git a/ramalama/model.py b/ramalama/model.py
index 710a27a..3231410 100644
--- a/ramalama/model.py
+++ b/ramalama/model.py
@@ -185,6 +185,7 @@ class Model:
             # Special case for Cuda
             if k == "CUDA_VISIBLE_DEVICES":
                 conman_args += ["--device", "nvidia.com/gpu=all"]
+                conman_args += ["--group-add", "keep-groups"]
             conman_args += ["-e", f"{k}={v}"]
         return conman_args
 
@@ -382,7 +383,7 @@ class Model:
             gpu_args = self.gpu_args(force=args.gpu)
             if gpu_args is not None:
                 exec_args.extend(gpu_args)
-            exec_args.extend(["--host", args.host])
+            exec_args.extend(["--host", args.host, "--gpu-layers", "16"])
         return exec_args
 
     def generate_container_config(self, model_path, args, exec_args):

khumarahn · 2025-01-29T14:22:40Z

I think, there needs to be a way to pass --gpu-layers to llama.cpp too. Ollama seems to compute that automatically somehow

ericcurtin · 2025-01-29T14:30:27Z

--gpu-layers is ngl in llama.cpp cli

khumarahn · 2025-01-29T14:39:05Z

Ah, right! This also works:

diff --git a/ramalama/model.py b/ramalama/model.py
index 710a27a..ad87b85 100644
--- a/ramalama/model.py
+++ b/ramalama/model.py
@@ -185,6 +185,7 @@ class Model:
             # Special case for Cuda
             if k == "CUDA_VISIBLE_DEVICES":
                 conman_args += ["--device", "nvidia.com/gpu=all"]
+                conman_args += ["--group-add", "keep-groups"]
             conman_args += ["-e", f"{k}={v}"]
         return conman_args
 
@@ -206,7 +207,7 @@ class Model:
             else:
                 gpu_args += ["-ngl"]  # single dash
 
-            gpu_args += ["999"]
+            gpu_args += ["16"]
 
         return gpu_args

rhatdan · 2025-01-29T15:23:50Z

The keep-groups flag makes sense for rootless containers. But it would not work for docker backend, so we need to be careful.

The gpu_args change I will leave up to @ericcurtin .

ericcurtin · 2025-01-29T15:52:51Z

I think we might need:

"--gpu-layers", "16"

but I'm not sure about changing 999 for llama.cpp, most of the time that's what you want, use the maximum amount of layers available.

We have to be careful about merging defaults that "work best on my GPU". CLIs to override defaults are fine too.

bmahabirbu · 2025-01-29T15:58:11Z

@khumarahn did you follow the nvidia cuda setup guide https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md. Setting up cuda with containers this way I no longer needed to pass --group-add keep-groups.

Could also just be the environment you're using as well.

khumarahn · 2025-01-29T16:11:32Z

I followed the guide, still need the groups.

"--gpu-layers", "16"should not be hardcoded, that is just what worked for my gpu with a particular AI. Ideally it should be configurable as a ramalama command line option. The reason I wanted to change it is, with 999 layers ramalama would crash failing to allocate enough gpu memory

ericcurtin · 2025-01-29T16:15:18Z

Feel free to add a --ngl option to RamaLama @khumarahn

bmahabirbu · 2025-01-29T16:19:36Z

for the --ngl bit try using n_gpu_layers = -1 see if that fixes the issue. Supposedly that automatically offloads the correct amount of gpu layers

khumarahn · 2025-01-29T19:51:11Z

for the --ngl bit try using n_gpu_layers = -1 see if that fixes the issue. Supposedly that automatically offloads the correct amount of gpu layers

I didn't find this in llama.cpp docs, and in my test -1 gpu layer only used 1gb of gpu ram

bmahabirbu · 2025-01-29T21:01:40Z

Good to know saw it mentioned in some posts and thought it was still relevant.

ericcurtin · 2025-01-29T22:35:14Z

for the --ngl bit try using n_gpu_layers = -1 see if that fixes the issue. Supposedly that automatically offloads the correct amount of gpu layers

I didn't find this in llama.cpp docs, and in my test -1 gpu layer only used 1gb of gpu ram

-1 doesn't work in llama.cpp , in llama.cpp 999 is use max layers.

It may work in vllm I don't know.

khumarahn · 2025-01-29T22:50:51Z

-1 doesn't work in llama.cpp , in llama.cpp 999 is use max layers.

But my llama.cpp crashes with 999 layers with

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 18508.35 MiB on device 0: cudaMalloc failed: out of memory

khumarahn · 2025-01-29T22:53:28Z

I created a PR. It seemed pretty straightforward, but please check me...

khumarahn closed this as completed Jan 29, 2025

khumarahn reopened this Jan 29, 2025

khumarahn mentioned this issue Jan 29, 2025

add --ngl to specify the number of gpu layers, and --keep-groups so podman has access to gpu #659

Merged

khumarahn closed this as completed Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman for nvidia may need `--group-add keep-groups` #655

podman for nvidia may need `--group-add keep-groups` #655

khumarahn commented Jan 29, 2025 •

edited

Loading

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025 •

edited

Loading

khumarahn commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025

rhatdan commented Jan 29, 2025

ericcurtin commented Jan 29, 2025 •

edited

Loading

bmahabirbu commented Jan 29, 2025 •

edited

Loading

khumarahn commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

bmahabirbu commented Jan 29, 2025

khumarahn commented Jan 29, 2025

bmahabirbu commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025

khumarahn commented Jan 29, 2025

podman for nvidia may need --group-add keep-groups #655

podman for nvidia may need --group-add keep-groups #655

Comments

khumarahn commented Jan 29, 2025 • edited Loading

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025 • edited Loading

khumarahn commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025

rhatdan commented Jan 29, 2025

ericcurtin commented Jan 29, 2025 • edited Loading

bmahabirbu commented Jan 29, 2025 • edited Loading

khumarahn commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

bmahabirbu commented Jan 29, 2025

khumarahn commented Jan 29, 2025

bmahabirbu commented Jan 29, 2025

ericcurtin commented Jan 29, 2025

khumarahn commented Jan 29, 2025

khumarahn commented Jan 29, 2025

podman for nvidia may need `--group-add keep-groups` #655

podman for nvidia may need `--group-add keep-groups` #655

khumarahn commented Jan 29, 2025 •

edited

Loading

khumarahn commented Jan 29, 2025 •

edited

Loading

ericcurtin commented Jan 29, 2025 •

edited

Loading

bmahabirbu commented Jan 29, 2025 •

edited

Loading