-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocminfo generates error HSA_STATUS_ERROR_OUT_OF_RESOURCES #5
Comments
I suggest testing whether cpu and motherboard supports atomicOps, just run
|
Hi |
There should be a /dev/kfd initialized while system startup. |
My bad (typed kdf instead of kfd), dmesg does return kfd
rocminfo (with and without sudo):
|
I ran this code, just in case ROCM was being weird. import tensorflow as tf
print(tf.test.gpu_device_name()) On running the above code I get this: 2021-12-31 18:14:15.175273: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-31 18:14:15.176885: E tensorflow/stream_executor/rocm/rocm_driver.cc:983] could not retrieve ROCM device count: HIP_ERROR_NoDevice |
try using it inside a docker: The next part is inside the container
with that you have rocm and the only thing you need is the proprietary drivers for amd.
and for pytorch:
i have not used pytorch so i don't know if it works, in this manner you will troubleshoot if your problem is from your installation of rocm, since it comes preinstaled with all you need, at least for tensorflow 2.6 |
@Lunatik00 Hey, I tried it out. same error:
|
then it is likely something with your driver on the host machine, you have mesa and amdgpu packages? |
I have the same issue with ROCm 5.2 #8 |
After installing the rocm 4.5.0 I followed for method and added rocm-dkms and rocm-libs and installing the rocmblas downloaded from here for rocm4.5.0.
When I run
rocm-smi
I get this:But when I run
rocminfo
I get this:I am already part of render and video groups
salik@salik-pc:~$ groups salik adm cdrom sudo dip video plugdev kvm render lpadmin lxd sambashare libvirt docker
Any help would be appreciated.
Kernel: Linux 5.11.0-43-generic #47
OS: 20.04.2-Ubuntu
ROCM: 4.5
The text was updated successfully, but these errors were encountered: