-
-
Notifications
You must be signed in to change notification settings - Fork 439
Stable Cascade
Original repo: https://github.com/Stability-AI/StableCascade
Note: Those steps will be removed for the release version
- install sdnext as usual and start it so all requirements are installed
- Install
diffusers
dev:pip uninstall diffusers pip install git+https://github.com/huggingface/diffusers/
- use SD.Next from dev branch
git checkout dev
- start with
--experimental
command line flag./webui --experimental --debug
- Set your compute precision in Settings -> Compute -> Precision
to either BF16 (if supported) or FP32 (if not supported) - Enable model offloading in Settings -> Diffusers -> Model CPU offload
without this, stable cascade will use ~16GB of VRAM - Select model from Networks -> Models -> Reference
and it will automatically be downloaded on first use and loaded into SD.Next
attempting to load a manually downloaded safetensors files is not supported as model requires special handling
With correct tuning, it is possible to run Stable Cascade on a 8GB VRAM GPU But any real performance is to be desired - 1024x1024 on RTX4090 using BF16 is barely reaching 4 it/s
- Prompt & Negative prompt: as usual
- Width & Height: as usual
- CFG scale: used to condition the prior model, reference value is ~4
- Secondary CFG scale: used to condition decoder model, reference value is ~1
- Steps: used to control number of steps of the prior model
- Refiner steps: used to control number of steps of the decoder model
-
Sampler: set to Default before loading a model
Stable Cascade has its own sampler and results with standard samplers will look suboptimal
Built-in sampler is DDIM/DDPM based, so if you want to experiment at least use similar sampler
- Default variation that will be downloaded and loaded is FULL model with BF16 precision
- If model download fails, simply retry it, it will continue from where it left off
- Model consists out of 3 stages split into 2 pipelines which are exected as C -> B -> A:
- Prior pipeline: 8.9 GB total = 1.3GB TextEncoder + 0.5 ImageEncoder + 7GB Stage C UNet
- Decoder pipeline: 4.4 GB total = 3.0GB Stage B Decoder + 1.4GB Stage A VQGan VAE
Note this is included as reference only as loading different variations is currently not supported
Stable cascade is a 3-stage model split into two pipelines (so-called prior and decoder) and comes into two main variations: Full and Lite
You can select which one to use from Networks -> Models -> Reference
Additionally, each variation comes in 3 different precisions: FP32, BF16, and FP16
Note: FP16 is an unofficial version by @KohakuBlueleaf of the model fixed to work with FP16 and may result in slightly different output
Which precision is going to get loaded depends on:
- your user preference in Settings -> Compute -> Precision
- and GPU compatibility as not all GPUs support all precision types
Stage A and auxiliary models sizes are fixed and noted above
Stage B and Stage C models are dependent on the variation and precision used
Variation | Precision | Stage B | Stage C |
---|---|---|---|
Full | FP32 | 6.2GB | 14GB |
Full | BF16 | 3.1GB | 7GB |
Full | FP16 | N/A | 7GB |
Lite | FP32 | 2.8GB | 4GB |
Lite | BF16 | 1.4GB | 2GB |
Lite | FP16 | N/A | N/A |
© SD.Next