Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Black image with Vulkan + SD3 medium #560

Open
olivbrau opened this issue Jan 9, 2025 · 0 comments
Open

Black image with Vulkan + SD3 medium #560

olivbrau opened this issue Jan 9, 2025 · 0 comments

Comments

@olivbrau
Copy link

olivbrau commented Jan 9, 2025

Hi everybody !
Still trying to use recent models with only 4 GB VRAM ...
This time, I tried SD3 medium.
I quantized it to Q4_0
And I get only black images (I have checked that this Q4 version works well on CPU backend and creates correct images)
And I precise also that small models like SD1.4 works well on the Vulkan backend.
With Q4_0, the amount of VRAM needed is lower than 2 GB, so it should be OK on my RTX A1000-4GB
I changed sampling method (euler / lcm), tried various nb iterations and various cfg_scale (7, 4.5)
I also tried Q5 quantization, and it is the same black result.
Does anybody have a clue ?

Here an example of log :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_Vulkan_2024_11_30>sd -m "..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors" --vae-on-cpu --sampling-method lcm --steps 10 --cfg-scale 4.5 -H 512 -W 512 -s 42 -t 20 -p "a cute cat" -v
Option:
n_threads: 20
mode: txt2img
model_path: ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
t5xxl_path:
diffusion_model_path:
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:true
diffusion flash attention:false
strength(control): 0.90
prompt: a cute cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 4.50
slg_scale: 0.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: lcm
schedule: default
sample_steps: 10
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:168 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA RTX A1000 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
ggml_vulkan: Compiling shaders..............................Done!
[INFO ] stable-diffusion.cpp:191 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] model.cpp:885 - load ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors using gguf format
[DEBUG] model.cpp:902 - init from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors'
[INFO ] stable-diffusion.cpp:238 - Version: SD3.x
[INFO ] stable-diffusion.cpp:271 - Weight type: q5_0
[INFO ] stable-diffusion.cpp:272 - Conditioner weight type: q5_0
[INFO ] stable-diffusion.cpp:273 - Diffusion model weight type: q5_0
[INFO ] stable-diffusion.cpp:274 - VAE weight type: q5_0
[DEBUG] stable-diffusion.cpp:276 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:315 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:318 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] mmdit.hpp:706 - MMDiT layers: 24 (including 0 MMDiT-x layers)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 81.25 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 462.63 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1075 - t5 params backend buffer size = 3123.05 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1075 - mmdit params backend buffer size = 1601.66 MB(VRAM) (491 tensors)
[INFO ] stable-diffusion.cpp:350 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:413 - loading weights
[DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors
[INFO ] model.cpp:1809 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q5_0 | 2 [4096, 32128, 1, 1, 1]' in model file
[INFO ] stable-diffusion.cpp:512 - total params memory size = 5363.17MB (VRAM 1601.66MB, RAM 3761.51MB): clip 3666.93MB(RAM), unet 1601.66MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:516 - loading model from '..\StableDiffusion 3 medium Q5\sd3_medium_incl_clips_t5xxl_q5_0.safetensors' completed, taking 8.52s
[INFO ] stable-diffusion.cpp:530 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:590 - finished loaded file
[DEBUG] stable-diffusion.cpp:1464 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1194 - prompt after extract and remove lora: "a cute cat"
[INFO ] stable-diffusion.cpp:673 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:690 - parse 'a cute cat' to [['a cute cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3755 ms
[DEBUG] conditioner.hpp:690 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 77
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736 - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM)
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 11.94 MB(RAM)
[DEBUG] conditioner.hpp:923 - computing condition graph completed, taking 3728 ms
[INFO ] stable-diffusion.cpp:1332 - get_learned_condition completed, taking 7490 ms
[INFO ] stable-diffusion.cpp:1355 - sampling using LCM method
[INFO ] stable-diffusion.cpp:1359 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1026 - mmdit compute buffer size: 169.64 MB(VRAM)
|==================================================| 10/10 - 2.35s/it
[INFO ] stable-diffusion.cpp:1395 - sampling completed, taking 24.55s
[INFO ] stable-diffusion.cpp:1403 - generating 1 latent images completed, taking 24.93s
[INFO ] stable-diffusion.cpp:1406 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1026 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:1045 - computing vae [mode: DECODE] graph completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1416 - latent 1 decoded, taking 15.97s
[INFO ] stable-diffusion.cpp:1420 - decode_first_stage completed, taking 15.97s
[INFO ] stable-diffusion.cpp:1539 - txt2img completed in 48.40s
save result image to 'output.png'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant