Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 11.3? #23

Open
Linux-cpp-lisp opened this issue Feb 23, 2022 · 9 comments
Open

CUDA 11.3? #23

Linux-cpp-lisp opened this issue Feb 23, 2022 · 9 comments

Comments

@Linux-cpp-lisp
Copy link

Hi all,

Thanks for your work packaging CUDA in an easy way for system76 machines!

PyTorch has moved up to CUDA 11.3 (see https://pytorch.org/get-started/locally/); does system76 expect to keep these releases up to date with NVIDIA releases, or should I install directly from NVIDIA if I need newer CUDA?

Thanks!

@mraxilus
Copy link

mraxilus commented May 12, 2022

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

@mmstick
Copy link
Member

mmstick commented May 12, 2022

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

@Linux-cpp-lisp
Copy link
Author

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

@mraxilus
Copy link

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

I was mistakenly using my nvidia-smi CUDA version instead of that reported by nvcc --version. The latest on System76's packages is still 11.2.

@mraxilus
Copy link

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

If that's so, then why provide the system76-cu* packages at all? I don't want to have to spin up docker containers just to access my GPU in a script, or test out features from a library with CUDA capabilities.

@gully
Copy link

gully commented Jun 3, 2022

👋 thanks for supporting these convenient cuda installs! Question---

I'm encountering this same friction point. I go to install pytorch but choices for prebuilt binaries are either cuda 10.2 or 11.3. I can get 11.1 or 11.2 from system76, but not 11.3. I tried installing pytorch from source, but that's a whole other issue.

I'd be open to a docker or podman route, but it's currently at odds with my development workflow, and would add some more mental overhead to navigate. A cuda 11.3 fix would slot right in to my existing workflow.

If anyone finds this and has a worked solution of setting up cuda 11.3 manually on pop!_os can you share? I may try it and share if I find a workaround...

@mmstick
Copy link
Member

mmstick commented Jun 3, 2022

Dev containers are the way to go

@gully
Copy link

gully commented Jun 3, 2022

Ok, my workaround is to default back to cuda 10.2. Both System76 and pytorch have binaries for 10.2, so it just works out-of-the-box. I tried it out on my particular pytorch application and it appears to have worked.

I suspect you're right that in the long term dev containers make it easier for portable and reproducible environments. For some reason dev containers still haven't taken off in scientific computing, or at least my sub-community of it. Is there a migration guide available or planned? I found this NVIDIA website that seems streamlined. Is that the dev container workflow ya'll would recommend?

If I get around to trying it out, I'd be open to writing one of those "support" guides that you have on your documentation. I adore that your docs are all open source! So cool.

@NickleDave
Copy link

@gully (and anyone else this helps)
to run pytorch in a dev container, I followed this tutorial:
https://blog.roboflow.com/nvidia-docker-vscode-pytorch/
but ended up needing to install nvidia-docker following a comment on this gist:
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e
specifically
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e?permalink_comment_id=4186634#gistcomment-4186634

  • sudo apt install nvidia-docker2
  • set the option no-cgroups = true in /etc/nvidia-container-runtime/config.toml (not control.toml in spite of what the comment says)
  • run with flags as that comment suggests; e.g. to test, docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants