-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GGUF versions for SD3.5 #136
Comments
I got it converted to gguf f16, by modifying the convert code, but the problem is llama quantize doesn't know sd3 arch. I also had to make a sd3 gguf loader node for comfyui and it works great. Stuck now at making the k files now SD3 convert Nodes GGUF Probably need a patch with directions for llama |
I've tried with SD3 before, idk what the hell to do about this specific weight, because the first dimension can't be 1 in any of the C++ code so it just gets stripped and converted to The diffusers format weights don't have that but those ones have the q/k/v split so it'll just fail on the reshape iirc. @wolfden I don't think it running in FP16 helps much, because that never involves any of the custom code anyway. Also, if you have to pass in the model type, then doesn't that mean that the auto model detection in comfy has failed, and that your checkpoint is in an invalid format? (Also, we can pass the model type for SD3 in the default node as well as a kwarg without having to have an extra node, since we can just add a switch based on the model arch returned from the loader.) |
I agree at f16 it don't help much, that wasn't the goal and I'm not a coder, but can generally figure things out and hack it together. Things work, but again, back to the llama issue |
@wolfden All good, happy to hear if anyone figures out anything lol. @cchance27 Actually, it looks like you are on an older version of the convert script. The current master one does pick up SD3 correctly for the python part at least. |
well shit i'll update over to that version, have you already figured out what layers need to be f16/f32 reserved on the lcpp patch? |
I have a rewritten version of the c++ logic part from a while ago that works with SD3 but I haven't figured out the issue with that pos_embed key being reshaped yet. |
Well, don't think the python convert + comfy loader code needs any extensive changes? Using this file as a base it does pick up the format correctly and seems to work fine. I've added SD3 as a valid arch to the loader 0ef158d I wonder how janky it would be to just patch that single key in post after the llama.cpp convert step for now just to have some weights for people to download lol. |
not sure why the git pull didn't pull latest convert, tested the new node.py and working fine here also |
I think just having gguf FP16 weights could be better than safetensors. There is an issue with the safetensors library on windows. ( comfyanonymous/ComfyUI#4480 (comment) ) It needs about double the RAM of the safetensors FP16 file size while loading. |
see above, he confirmed the convert.py already works for FP16 GGUF, it would appear on the latest version, it's the quantizing via llama.cpp patch, thats currently being looked into as the pos_embed seems to be getting messed up when quantizing. |
Some more info about the issue at hand, in case anyone has any better ideas. The SAI reference checkpoint has a weight shaped In comfy, this just manifests as a shape mismatch while loading, as expected:
@cchance27 How did you get your version to bypass that specific issue, assuming you even ran into it lol Also, current WIP patch with the issue above still present, but some logic for SD3 added in and the main parts rewritten. lcpp_new.patch It's for the newest llama.cpp version since I was curious if it'd solve the issue (it didn't). |
I stupidly wiped the stash I had, when I saw others were already working on it, the lcpp patch you shared is from a different version than I was playing with, it had the checks for leaving some things at fp32 and some at fp16, my first hack attempt that worked, wasn't excluding the proper layers from being quanted, except for final_layer. Is it possible theres code elsewhere in the original quantizer for pos_embed? pretty sure thats a layer name they use in some llm's as well isn't it? |
haha heres something funny, that Q5 model i was having black screens with and erased cause i figured i had messed it up, might not have been messed up because for some reason i just got a black render halfway through the steps on the standard safetensors generation :S |
Odd, wonder if it needs BF16 like flux does for the inference. The weights are in FP16 though so that'd be odd. The VAE hash is different from SD3 but they seem close enough at least. BTW, this is the part that's causing the dimension to be stripped, since yeah, it just checks if it's >1 to see whether it should keep it or not and due to the reversed order it just gets stripped I think. (Can't manually set it because it gets called directly from int ggml_n_dims(const struct ggml_tensor * tensor) {
for (int i = GGML_MAX_DIMS - 1; i >= 1; --i) {
if (tensor->ne[i] > 1) {
return i + 1;
}
}
return 1;
} |
@city96 what tag of llama.cpp are you applying th new patch to for some reason mine isn't lining the line numbers correctly |
@cchance27 I pushed the patch + the actual tag I was targeting. Should be Apparently SD3.5 also uses a different shape for the keys, which is why K quants don't work. We could use the reshape logic that's also used for SDXL but that needs possible future logic changes. |
i tried a q8_0 and q4_0 and both when it loads i get
|
Huh, strange. It works with sd3 medium, but with large it only works with the (old) gguf-py from the llama.cpp repo but not the package one. |
I added a print line into that gguf-py for where its crashing and it seems to be on y_embedder.mlp.2.weight [2432 2432] 5914624, which is odd because... thats a F16 so not even one of the Q8 layers? also it's weird because it didn't crash on the t_embedder_mpl.2.weight that it processed before that that looks to be same size and np_dims part of the log i added in gguf-py sitepackage
why would the y_embedder crash but not the t_embedder step |
Okay, I think I got it. I think modifying the metadata after the initial bit where it adds it might be corrupting the data parts, possibly due to the data length changing. That might be because I'm passing it a |
@cchance27 Could you re-test with the new patch? |
Looking good, updated it and rebuilt with new patch, just did a Q8_0 and got passed the loader, diffusing now (i'm on mac it takes a bit to diffuse XD) |
Gonna test all the major quants and then upload them to civit in a couple |
Sounds good, I'll put some up on huggingface in that case and then we should be all covered I think. |
Sounds good thanks for all the help figuring it out, i'd never have tracked down that uint issue lol, too long since i fought with CPP XD |
I guess the K quants I'll look at separately sometime on the weekend maybe, also depends on how much community adoption it'll get. I think a lot of people prefer the legacy _0 and _1 quants for speed alone lol. Also, in case you're lazy like me: name = "sd3.5_large"
path = "~/ComfyUI/models/unet"
for q in ["Q8_0", "Q5_1", "Q5_0", "Q4_1", "Q4_0"]#, "Q6_K", "Q5_K_M", "Q5_K_S", "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q3_K_S", "Q2_K"]:
print(f"./llama-quantize {path}/{name}-F16.gguf {path}/{name}-{q}.gguf {q}") |
Here's the basic quants: |
Got it working on 8gb vram! |
Thanks, legend! You’re my hero. |
Thanks All! Awesome! |
When loading SD3.5 GGUF in Comfy, do you have to use a regular checkpoint loader on the original safetensors to get the VAE? Or is the VAE included in SD3.5, and we just can't output it yet from the GGUF? |
@Jonseed You can either download it separately from the SD3 huggingface repo (VAE folder) or use the SaveVAE node on the checkpoint you already have. The gguf files only ever contain the UNET part. (Also see this comment for more info: #139 (comment) ) |
SD3.5 large was just released !
https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
The text was updated successfully, but these errors were encountered: