-
-
Notifications
You must be signed in to change notification settings - Fork 438
ZLUDA
ZLUDA (CUDA Wrapper) for AMD GPUs in Windows
ZLUDA does not fully support PyTorch in its official build. So ZLUDA support is so tricky and unstable. Support is limited at this time. Please don't create issues regarding ZLUDA on github. Feel free to reach out via the ZLUDA thread in the help channel on discord.
This guide assumes you have git and python installed, have used SD.Next before, and are comfortable using the command prompt, navigating Windows Explorer, renaming files and folders, and working with zip files.
A list of compatible GPUs can be found here.
If your GPU is not on the list, you may need to build your own roclabs, please follow the instructions in Rocm Support guide.
(Note: including integrated GPUs)
Note: If you have an integrated AMD GPU (iGPU), you may need to disable it, or use the HIP_VISIBLE_DEVICES
environment variable. Learn more here.
Note: Most everyone would have this anyway, since it comes with a lot of games, but there's no harm in trying to install it.
Grab the latest version of Visual C++ Runtime from https://aka.ms/vs/17/release/vc_redist.x64.exe (this is a direct download link) and then run it.
If you get the options to Repair or Uninstall, then you already have it installed and can click Close. Otherwise, install it.
ZLUDA is now auto-installed, and automatically added to PATH, when starting webui.bat with --use-zluda
.
Install HIP SDK 5.7 from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html
So long as your regular AMD GPU driver is up to date, you don't need to install the PRO driver HIP SDK suggests.
Click the Start button, type "env", click "Edit the system environment variables", then click the "Environment Variables" button at the bottom.
In the bottom set of variables, check to see if HIP_PATH
is there. If not, Click the bottom "New" button.
Set the variable name to HIP_PATH
, and the variable value to C:\Program Files\AMD\ROCm\5.7\
, then click OK.
Note: If you installed the HIP SDK to another location, adjust the variable value accordingly.
Add %HIP_PATH%bin
to your PATH.
https://github.com/brknsoul/ROCmLibs/wiki/Adding-folders-to-PATH
Note: %HIP_PATH%bin
typically relates to "C:\Program Files\AMD\ROCm\5.7\bin"
, assuming Windows is installed on C:.
Go to https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html and find your GPU model.
If your GPU model has a ✅ in both columns then skip to Compilation, Settings, and First Generation.
If your GPU model has an ❌ in the HIP SDK column (LLVM targets gfx1031 and gfx1032) follow the instructions below;
- Open Windows Explorer and copy and paste
%HIP_PATH%bin\rocblas
in to the location bar. - Rename
library
to something else, likeoriglibrary
-
Note: Thanks to FremontDango, these alternate libraries for gfx1031 and gfx1032 GPUs are about 50% faster;
- If you have a 6700, 6700xt, or 6750xt (gfx1031) GPU, download Optimised_ROCmLibs_gfx1031.7z.
- If you have a 6600, 6600xt, or 6650xt (gfx1032) GPU, download Optimised_ROCmLibs_gfx1032.7z.
(Note: You may have to install 7-Zip to unzip the optimised .7z files.)
- Open the zip file.
- Drag and drop the
library
folder from zip file into%HIP_PATH%bin\rocblas
(The folder you opened in step 1). - Reboot PC
If your GPU model not in the HIP SDK column or not available in the above list, follow the instructions in Rocm Support guide to build your own RocblasLibs.
(Note: Building your own libraries is not for the faint of heart.)
Install SD.Next;
git clone https://github.com/vladmandic/automatic
then
cd automatic
then
webui.bat --use-zluda --debug --autolaunch
or Update SD.Next
(from a current SD.Next install folder)
venv\scripts\activate
pip uninstall -y torch-directml torch
deactivate
git pull
webui.bat --use-zluda --debug --autolaunch --reinstall
(after running successfully once, you can remove --reinstall
)
Note: ZLUDA functions best in Diffusers Backend, where certain Diffusers-only options are available
After the UI starts, head on over to System Tab > Compute Settings
Set "Attention optimization method" to "Dynamic Attention BMM", then click Apply settings.
Now, try to generate something.
This should take a fair while (10-15mins, or even longer; some reports state over an hour) to compile, but this compilation should only need to be done once.
Note: There will be no progress bar, as this is done by ZLUDA and not SD.Next. Eventually your image will start generating.
DirectML | ZLUDA | |
---|---|---|
Speed | Slower | Faster |
VRAM usage | More | Less |
VRAM GC | ❌ | ✅ |
Traning | * | ✅ |
Flash Attention | ❌ | ❌ |
FFT | ❓ | ✅ |
FFTW | ❓ | ❌ |
DNN | ❓ | 🚧 |
RTC | ❓ | ❌ |
Source Code | Closed | Opened |
Python | <=3.10 | Same as CUDA |
*: Known as possible, but uses too much VRAM to train stable diffusion models/LoRAs/etc.
DTYPE | |
---|---|
FP64 | ✅ |
FP32 | ✅ |
FP16 | ✅ |
BF16 | ✅ |
LONG | ✅ |
INT8 | ✅* |
UINT8 | ✅* |
INT4 | ❓ |
FP8 | ❌ |
BF8 | ❌ |
*: Not tested.
Sections below are optional and highly experimental, and aren't required to start generating images. Ensure you can generate images first before trying these.
Start SD.Next, head on over to System Tab > Compute Settings.
Scroll down to "Model Compile" and tick the 'Model', 'VAE', and 'Text Encoder' boxes.
Select "deep-cache" as your Model compile backend.
Apply and Shutdown, and restart SD.Next.
© SD.Next