-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation of the Library #54
Comments
@i3s93 You don't need to install anything related to the source code of cuDSS. You just need julia> ]
pkg> add CUDSS It's explained in the README.md but I should add a note that it also installs the shared library. |
Thank you @amontoison for your rapid response. I actually started with the base installation in the README.md, but encountered the same error message. That is why I tried to manually set the path, but neither approach worked for me. Here is what I see on my end when I execute the code from my previous comment:
|
Can you remove the environment variable force_recompile(package_name::String) = Base.compilecache(Base.identify_package(package_name))
force_recompile("CUDSS")
using CUDSS |
If it's still not working, what is your NVIDIA GPU and operating system / architecture? |
I tried your solution, but I'm still seeing the same problem. I'm running with an NVIDIA A100 GPU with an AMD EPYC 7763 processor. The operating system is SUSE Linux Enterprise Server 15 SP4. |
Did you install CUDSS.jl on a node with a rm -rf ~/.julia/artifacts/* |
Can you also display the output of: julia> CUDSS_jll.host_platform
Linux x86_64 {cuda=none, cuda_local=false, cxxstring_abi=cxx11, julia_version=1.10.4, libc=glibc, libgfortran_version=5.0.0, libstdcxx_version=3.4.30} On my laptop I don't have an NVIDIA GPU so the shared library of cuDSS is not installed. Are the NVIDIA drivers installed on your computer? |
Okay, I have removed the artifacts as you have suggested. When I installed the package, I was on a node with the A100. Here is the output you requested:
I still see the same error message. |
Just to follow up, I was able to install and run the code from the package locally on a laptop with an NVIDIA GPU. So far, I have only been able to see this issue when I try to install the package on a remote cluster. I will reach out to the system administrators and see if something on their end is disrupting the installation. |
Are you using a module on the cluster to get Julia? (I.e. It seems that you're trying to use a local cuda. Assuming that wasn't your intention and own doing, it might be a global preference that is set when you load a Julia module. Btw, which cluster is this? |
@carstenbauer: This is on Perlmutter, if that helps. Here is the output of
I can run any of my Julia CUDA codes fine without the CUDA modules, so the CUDA Toolkit is not necessary. I see the same error regardless of whether not this module is loaded. |
@i3s93 I just tested this on Perlmutter. If I use the julia module ( However, if I
your test above works without any issues in a clean Julia environment that just has |
The environment in the global @i3s93 did unsetting |
@JBlaschke I assume the question was for me, because I was the one that did the (successful) test with julia> CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.0
NVIDIA driver 525.105.17
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+525.105.17
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6
1 device:
0: NVIDIA A100-PCIE-40GB (sm_80, 38.984 GiB / 40.000 GiB available) For comparison, this is if I don't unset and don't unload the julia> CUDA.versioninfo()
CUDA runtime 12.2, local installation
CUDA driver 12.2
NVIDIA driver 525.105.17
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 2023.2.0 (API 20.0.0)
- NVML: 12.0.0+525.105.17
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0
- CUDA_Runtime_Discovery: 0.3.4
Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6
Preferences:
- CUDA_Runtime_jll.version: 12.2
- CUDA_Runtime_jll.local: true
1 device:
0: NVIDIA A100-PCIE-40GB (sm_80, 38.984 GiB / 40.000 GiB available) |
Thanks @carstenbauer for checking. So One more thing: does the artifact even work on a compute? For previous versions we would get segfaults. |
It looks like we don't have a version on Perlmutter yet. I might go and check the artifact install of CUDA. If that doesn't work I'd need to develop a module. |
@JBlaschke Do you mean the artifact of cuDSS? |
@amontoison no I meant running CUDA.jl using the artifact CUDA (instead of the one provided by the OS) |
On Perlmutter |
@carstenbauer Thank you for taking the time to help resolve this issue! I can also confirm that unsetting @JBlaschke Thank you for your help as well! My tests with cuDSS are a small scale, so I am fine with unsetting the environment variable until a better solution becomes available. @amontoison I greatly appreciate the timely feedback and for having a look at this problem. Since this does not appear to be an issue with CUDSS.jl, I'm fine with closing this issue, unless the others would like to continue the discussion! |
Am I wondering how relevant it will be to detect a local installation of cuDSS: cuDSS is still in preview so every minor release breaks the API, and it requires the local installation to be always the most recent version, which is probably hard to maintain. |
@amontoison in the past CUDA would not work at all unless you used the local install on Perlmutter. It might be the case that this is no longer necessary. I haven't had a chance to test this. Will do so soon. If it is the case that running CUDA_jll is unstable on Perlmutter, then we have no choice but to also use a local CUDSS install... |
@carstenbauer @JBlaschke @i3s93 Do you know why Tim checks whether precompiling in this function |
I would like to use tools from this library in one of my projects, but I'm having some difficulties with the installation process on a Linux cluster.
I have extracted and set the library path to the shared object files for cuDSS following the directions given here. After installing CUDSS.jl, I tried to execute the following test:
On the third line, I receive the following error message:
I'm not sure what I am doing wrong. I have also tried setting the environment variable
JULIA_CUDSS_LIBRARY_PATH
which is used to set the path forlibcudss
. Something is not being set properly. I'm using CUDA.jl (v5.4.3) and CUDSS.jl (v0.3.1) on Julia v1.9, if that helps.The text was updated successfully, but these errors were encountered: