Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD3.5 - Workaround for Do not promote FP8 error #2220

Draft
wants to merge 1 commit into
base: sd35
Choose a base branch
from

Conversation

saunderez
Copy link

Workaround the FP8 Do Not Promote error by casting to FP32 first.
Check both tensors are on the same device before using.

Make sure both tensors are on the same device and cast to FP32 to avoid the do not promote FP8 error and allow FP8 models to work
@saunderez
Copy link
Author

It's slow but it works, as you can see I'm using the FP8 version of SD3.5 Large.

image

Result
image

@VeteranXT
Copy link

There is SD3.5 medium version of it. I do recommend it to use it.

@andy8992
Copy link

andy8992 commented Nov 4, 2024

What's left to add 3.5 medium support... this is needed

@VeteranXT
Copy link

Hype Train!

@andy8992
Copy link

andy8992 commented Nov 7, 2024

I check this dang github page every single day hoping there'll be news about this

@saunderez
Copy link
Author

When I posted this workaround I figured someone would fix it properly in matter of days. I was mucking around with SageAttention which magically fixed the slow generation, and I would've added that here too but its too hacky. Attention needs to be split out so it's all handled in the one place, back in the Auto1111 Dreambooth days I made a wrapper function for the model that worked well but not sure if that's feasible with this implementation.

@likelovewant
Copy link

likelovewant commented Nov 21, 2024

by changing float32 to float16 , the speed will catch up and works.

    x_embed = self.x_embedder(x).to(torch.float16)
    pos_embed = self.cropped_pos_embed(hw).to(torch.float16).to("cuda")
    x = x_embed + pos_embed

@saunderez
Copy link
Author

saunderez commented Nov 29, 2024

by changing float32 to float16 , the speed will catch up and works.

    x_embed = self.x_embedder(x).to(torch.float16)
    pos_embed = self.cropped_pos_embed(hw).to(torch.float16).to("cuda")
    x = x_embed + pos_embed

Yeah to be honest I just made it cast to FP32 because it was most likely to just work. I just wanted to try out the model while I waited for official support so when the first thing I tried worked it was enough to satisfy my curiosity.

@VeteranXT
Copy link

How did you made it work?

@likelovewant
Copy link

by changing float32 to float16 , the speed will catch up and works.

    x_embed = self.x_embedder(x).to(torch.float16)
    pos_embed = self.cropped_pos_embed(hw).to(torch.float16).to("cuda")
    x = x_embed + pos_embed

Yeah to be honest I just made it cast to FP32 because it was most likely to just work. I just wanted to try out the model while I waited for official support so when the first thing I tried worked it was enough to satisfy my curiosity.

Understood,It’s so great you can find a quick workaround to get FP8 support, even if it's a temporary turnaround fix. Thanks for sharing your solution. Now, let's keep our fingers crossed for the Forge official SD3.5 medium release. I've run a few tests, but unfortunately, without much luck in getting it to work. Let‘s hope Lllyasviel will be able to make it happen soon, despite his busy schedule." @saunderez

@likelovewant
Copy link

How did you made it work?

simply mannuly change the code https://github.com/lllyasviel/stable-diffusion-webui-forge/pull/2220/files
@VeteranXT

@VeteranXT
Copy link

i did that i get "cant recognize model..."

@likelovewant
Copy link

i did that i get "cant recognize model..."

First , git checkout sd35 , edit the files , except changing float32 to float16 , downlaod necessary file for SD35 ,eg ,extra clip-g,t5xxl_fp8_e4m3fn.safetensors``sd3.5 large-fp8 models . then start the program. as showed #2183,
#2161 (comment)

@VeteranXT

@danilomaiaweb
Copy link

Here not work for me. See my forge version:

app: stable-diffusion-webui-forge.git
updated: 2024-12-10
hash: e073e4e
url: https://github.com/lllyasviel/stable-diffusion-webui-forge.git/tree/main

My branch is origin/main
How make full integration of SD3.5 on my version?

I have than chance my branch by sd35 (main...sd35) ?
Thank in advance

@likelovewant
Copy link

likelovewant commented Dec 17, 2024

Here not work for me. See my forge version:

app: stable-diffusion-webui-forge.git updated: 2024-12-10 hash: e073e4e url: https://github.com/lllyasviel/stable-diffusion-webui-forge.git/tree/main

My branch is origin/main How make full integration of SD3.5 on my version?

I have than chance my branch by sd35 (main...sd35) ? Thank in advance

simply
git checkout sd35
and do the edit . launch it . if there is any error and cannot figur it out ,it's better to has error log paste here . otherwise ,nobody knows what's going on .
@danilomaiaweb

@VeteranXT
Copy link

Well im using AMD forge . Fork of this. I did edit files but all i get is --can't recognize model.

@likelovewant
Copy link

Well im using AMD forge . Fork of this. I did edit files but all i get is --can't recognize model.

test worked on AMD GPU in Zluda way ,it works. perhaps need update your huggingface guess #2161 (comment)
@VeteranXT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants