Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update model.py to enable CUDA when available #478

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion ramalama/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import glob
import atexit
import shlex
import subprocess
import shutil

from ramalama.common import (
default_image,
Expand Down Expand Up @@ -152,6 +154,16 @@ def setup_container(self, args):
gpu_type, gpu_num = get_gpu()
if gpu_type == "HIP_VISIBLE_DEVICES" or gpu_type == "ASAHI_VISIBLE_DEVICES":
conman_args += ["-e", f"{gpu_type}={gpu_num}"]

#podman is not a drop-in replacement for docker... need to "see" which we're in here, ala common.py->available logic
#risk of duplicating code here, maint problem
if gpu_type == "CUDA_VISIBLE_DEVICES":
if shutil.which("podman"):
FNGarvin marked this conversation as resolved.
Show resolved Hide resolved
#todo is /all/ appropriate?
conman_args += ["--device", "nvidia.com/gpu=all"] #AFAIK, cuda requires this for podman
else:
conman_args += ["--gpus", "all"] #and this for Docker

return conman_args

def run_container(self, args, shortnames):
Expand Down Expand Up @@ -389,7 +401,15 @@ def get_gpu():
content = file.read()
if "asahi" in content.lower():
return "ASAHI_VISIBLE_DEVICES", 1


try:
#TODO I don't currently have access to a PC w/ multiple NVidia GPUs nor an NVidia Mac... but I *think* that
#every Linux and Windows machine having modern NVidia will have nvidia-smi and that the number of lines corresponds to the number of zero-indexed gpus
check_output = subprocess.run(['nvidia-smi', '-L'], check=True, capture_output=True) #shell to nvidia-smi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Redundant error checking with subprocess.run()

Using check=True with manual returncode checking is redundant as check=True will raise CalledProcessError on non-zero exit codes.

        check_output = subprocess.run(['nvidia-smi', '-L'], capture_output=True, check=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use run_cmd

Copy link
Collaborator

@bmahabirbu bmahabirbu Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go a step further and query the nvidia-smi command itself to get more info! For example doing this command nvidia-smi --query-gpu=index,memory.total --format=csv,noheader,nounits | sort -t, -k2 -nr | head -n 1 Can get us the largest vram GPU and id formatted as "id, vram-in-mb".

We can do something like this

try:
        command = ['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits']
        output = run_cmd(command)
        gpus = output.stdout.strip().split('\n')
        gpus_sorted = sorted(gpus, key=lambda x: int(x.split(',')[1]), reverse=True)
        return "CUDA_VISIBLE_DEVICES",  gpus_sorted[0][0]
except Exception: {} #fall through
    return None, None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a fabulous idea. My inclination in general is to avoid shelling out, but I don't think we're going to find a better or more lightweight way to test for the presence of NVidia and CUDA. Probably why all the NVidia Container Toolkit docs seem to use it as a sanity check for installs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same I'd rather not shell out myself but doing it this way avoids complications with different systems. if we can assume a system has an nvidia GPU then most likely there will be drivers installed with nvidia-smi as well.

Right now the vulkan backed for llama.cpp doesn't have all the functionality like cuda and hip blas does. But later down the line id like to switch to vulkan and use the vulkan SDK to query GPU data since it supports amd nvidia and intel graphics.

if not check_output.returncode: #if command EXIT_SUCCESS
return "CUDA_VISIBLE_DEVICES", len(check_output.stdout.splitlines()) # ret cuda, #gpus?
except Exception: {} #fall through

return None, None


Expand Down
Loading