-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA_VISIBLE_DEVICES not working as expected #65
Comments
Makes sense. |
Right, I was originally going to do that but for reasons not worth explaining here it ended up being preferable to use the environment variable. (Which is advertised by Nvidia here.) But I will try to switch back to using the docker flags, actually docker-compose configuration in my case. |
If you wait, I'll edit the scripts to accommodate this usage case. |
Thanks! FYI I switched to doing this in my docker compose file:
But the exact same problem happened. This must end up creating an NVIDIA_VISIBILE_DEVICES environment variable anyway, because the container still failed calling |
Hi, I'm trying to run multiple containers on a multi-GPU machine, with each container assigned its own GPU. I initially tried this by setting NVIDIA_VISIBLE_DEVICES to 0 and 1 in each container, respectively, but the container with the "1" setting wouldn't launch the desktop. I narrowed this down to line 91 in entrypoint.sh:
export GPU_SELECT="$(nvidia-smi --id=$(echo ${NVIDIA_VISIBLE_DEVICES} | cut -d ',' -f1) --query-gpu=uuid --format=csv,noheader | head -n1)"
Prior to launching the container, NVIDIA_VISIBLE_DEVICES set to "1" tells the nvidia docker runtime to use the second GPU, i.e. the GPU with id of 1 in the output of nvidia-smi in bare metal. However within the container there is only one GPU visible (which is expected) and it has been reassigned the id of 0 in nvidia-smi (which is breaking the logic). So the query of "nvidia-smi --id=1 ..." within the container fails.
As a workaround I can set NVIDIA_VISIBLE_DEVICES in the second container to "1,0", which exposes both GPUs while also allowing "1" to still refer to the second GPU (and launching the desktop there since it's listed first). However for other reasons I'd prefer to not expose two GPUs to that container.
Is this an unexpected use case? I'm not sure how entrypoint.sh is intending to NVIDIA_VISIBLE_DEVICES in the first place if a numerical id means something different outside vs inside the container.
The text was updated successfully, but these errors were encountered: