-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error: the function failed to launch on the GPU #579
Comments
Are you using a custom build, or the release? I suspect the cuda build in the lastest release doesn't support Ampere (RTX 30) architecture for some reason unknown to me. (no Ada (RTX 40) or Hopper support either). So you would have to build it from source on your end to get it to work... Or use something else like the Vulkan build. https://github.com/leejet/stable-diffusion.cpp/blob/master/.github/workflows/build.yml#L166 |
I am using the release. It is not supporting Ampere? That's very strange. So stable-diffusion.cpp only works with old gpus? That makes it useless for general folk, because most people just use the released exe, not building it every time. I think that support should be added asap. |
Yes, that's not something I expected. Vulkan should work though, but performance might not be as good. @ag2s20150909 is there anything preventing us from adding the more recent cuda architectures to the CI build? |
Maybe |
Just trying the example on the readme page with JuggernautXL, but it fails:
sd.exe -m "G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors" --cfg-scale 7.5 --steps 35 --sampling-method euler -H 1024 -W 1024 --seed 42 --diffusion-fa -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3070 Ti, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:195 - loading model from 'G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors'
[INFO ] model.cpp:888 - load G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:242 - Version: SDXL
[INFO ] stable-diffusion.cpp:275 - Weight type: f16
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
[WARN ] stable-diffusion.cpp:289 - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] stable-diffusion.cpp:326 - Using flash attention in the diffusion model
|==================================================| 2641/2641 - 333.33it/s
[INFO ] stable-diffusion.cpp:516 - total params memory size = 8113.89MB (VRAM 8113.89MB, RAM 0.00MB): clip 3119.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:520 - loading model from 'G:\AI\Image\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors' completed, taking 6.22s
[INFO ] stable-diffusion.cpp:554 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:688 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1241 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1374 - get_learned_condition completed, taking 1138 ms
[INFO ] stable-diffusion.cpp:1397 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1434 - generating image: 1/1 - seed 42
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\template-instances../fattn-wmma-f16.cuh:422: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 600. ggml-cuda.cu was compiled for: 600
CUDA error: the function failed to launch on the GPU
current device: 0, in function ggml_cuda_op_mul_mat_cublas at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:1151
cublasSgemm_v2(ctx.cublas_handle(id), CUBLAS_OP_T, CUBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ddf_i, ne00, src1_ddf1_i, ne10, &beta, dst_dd_i, ldc)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:70: CUDA error
The text was updated successfully, but these errors were encountered: