Add GGUF versions for SD3.5 #136

JorgeR81 · 2024-10-22T17:01:05Z

SD3.5 large was just released !

https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo
https://huggingface.co/stabilityai/stable-diffusion-3.5-large

wolfden · 2024-10-22T19:22:55Z

I got it converted to gguf f16, by modifying the convert code, but the problem is llama quantize doesn't know sd3 arch. I also had to make a sd3 gguf loader node for comfyui and it works great. Stuck now at making the k files now

SD3 convert
https://huggingface.co/ND911/model/blob/main/convert_sd3.py

Nodes
https://huggingface.co/ND911/model/blob/main/nodes.py

GGUF
https://huggingface.co/ND911/stable-diffusion-3.5-large-GGUF/blob/main/sd3.5_large-F16.gguf

Probably need a patch with directions for llama

city96 · 2024-10-22T19:37:10Z

I've tried with SD3 before, idk what the hell to do about this specific weight, because the first dimension can't be 1 in any of the C++ code so it just gets stripped and converted to [36 864, 2 432] which then fails to load when the comfy SD3 specific code hits it.

The diffusers format weights don't have that but those ones have the q/k/v split so it'll just fail on the reshape iirc.

@wolfden I don't think it running in FP16 helps much, because that never involves any of the custom code anyway. Also, if you have to pass in the model type, then doesn't that mean that the auto model detection in comfy has failed, and that your checkpoint is in an invalid format? (Also, we can pass the model type for SD3 in the default node as well as a kwarg without having to have an extra node, since we can just add a switch based on the model arch returned from the loader.)

cchance27 · 2024-10-22T19:45:46Z

I was just trying to get it working myself

("sd35", (
    ("joint_blocks.0.context_block.adaLN_modulation.1.bias",),
)),

to the model detection in convert.py so it would detect and label, also i added to blacklist "embedder." since i think thats the layers for the embeddings in the new model :S

For the gguf node, I added "sd35" to the line 66 i believe to make it so the unet loader supports the new sd model label.

the lpppatch i got to work but think i did it wrong, because i likely needed to add better layer exceptions for fp16/fp32, for my test i only added the exceptions for final_layer as fp16 and final_layer.linear as fp32....

Here's some examples, all based on the example workflow from comfy, euler_
BASELINE Safetensors (16.46gb):

FP16 GGUF (16.29gb):

Q8_0 GGUF (8.75gb):

Q5_K_M GGUF (5.92gb):
Something wrong it blew up to a black image, so i definitely broke something LOL

wolfden · 2024-10-22T19:49:31Z

@wolfden I don't think it running in FP16 helps much, because that never involves any of the custom code anyway. Also, if you have to pass in the model type, then doesn't that mean that the auto model detection in comfy has failed, and that your checkpoint is in an invalid format? (Also, we can pass the model type for SD3 in the default node as well as a kwarg without having to have an extra node, since we can just add a switch based on the model arch returned from the loader.)

I agree at f16 it don't help much, that wasn't the goal and I'm not a coder, but can generally figure things out and hack it together. Things work, but again, back to the llama issue

city96 · 2024-10-22T19:56:58Z

@wolfden All good, happy to hear if anyone figures out anything lol.

@cchance27 Actually, it looks like you are on an older version of the convert script. The current master one does pick up SD3 correctly for the python part at least.

cchance27 · 2024-10-22T19:57:57Z

well shit i'll update over to that version, have you already figured out what layers need to be f16/f32 reserved on the lcpp patch?

city96 · 2024-10-22T19:59:41Z

I have a rewritten version of the c++ logic part from a while ago that works with SD3 but I haven't figured out the issue with that pos_embed key being reshaped yet.

city96 · 2024-10-22T20:08:28Z

Well, don't think the python convert + comfy loader code needs any extensive changes? Using this file as a base it does pick up the format correctly and seems to work fine. I've added SD3 as a valid arch to the loader 0ef158d

I wonder how janky it would be to just patch that single key in post after the llama.cpp convert step for now just to have some weights for people to download lol.

wolfden · 2024-10-22T20:19:06Z

not sure why the git pull didn't pull latest convert, tested the new node.py and working fine here also

JorgeR81 · 2024-10-22T20:31:23Z

I think just having gguf FP16 weights could be better than safetensors.

There is an issue with the safetensors library on windows. ( comfyanonymous/ComfyUI#4480 (comment) )

It needs about double the RAM of the safetensors FP16 file size while loading.
I have this issue with Flux. I only have 32 GB RAM, so it uses my page file.
But this does not happen with the Flux GGUF versions.

cchance27 · 2024-10-22T20:32:48Z

I think just having an gguf FP16 weights could be better than safetensors.

see above, he confirmed the convert.py already works for FP16 GGUF, it would appear on the latest version, it's the quantizing via llama.cpp patch, thats currently being looked into as the pos_embed seems to be getting messed up when quantizing.

city96 · 2024-10-22T20:47:48Z

Some more info about the issue at hand, in case anyone has any better ideas.

The SAI reference checkpoint has a weight shaped [1, 36864, 2432]. In the GGUF file, the order is reversed, and the key is saved as [2432, 36864, 1] - all good so far. The issue is that when the C++ conversion runs, the weights are padded with 1s for the unused dimensions, so this particular one becomes [2432, 36864, 1, 1] and during saving becomes [2432, 36864] with the first (last) dim stripped even if no conversion happens (i.e. kept as FP16).

In comfy, this just manifests as a shape mismatch while loading, as expected:

size mismatch for pos_embed: copying a param with shape torch.Size([36864, 2432]) from checkpoint, the shape in current model is torch.Size([1, 2432, 2432]).

@cchance27 How did you get your version to bypass that specific issue, assuming you even ran into it lol

Also, current WIP patch with the issue above still present, but some logic for SD3 added in and the main parts rewritten. lcpp_new.patch It's for the newest llama.cpp version since I was curious if it'd solve the issue (it didn't).

cchance27 · 2024-10-22T20:57:43Z

I stupidly wiped the stash I had, when I saw others were already working on it, the lcpp patch you shared is from a different version than I was playing with, it had the checks for leaving some things at fp32 and some at fp16, my first hack attempt that worked, wasn't excluding the proper layers from being quanted, except for final_layer.

Is it possible theres code elsewhere in the original quantizer for pos_embed? pretty sure thats a layer name they use in some llm's as well isn't it?

city96 · 2024-10-22T21:23:14Z

Doesn't seem to be name related, good call though (I think the LLM specific ones are called token_embd.weight).

cchance27 · 2024-10-22T21:35:13Z

haha heres something funny, that Q5 model i was having black screens with and erased cause i figured i had messed it up, might not have been messed up because for some reason i just got a black render halfway through the steps on the standard safetensors generation :S

city96 · 2024-10-22T21:45:37Z

Odd, wonder if it needs BF16 like flux does for the inference. The weights are in FP16 though so that'd be odd. The VAE hash is different from SD3 but they seem close enough at least.

BTW, this is the part that's causing the dimension to be stripped, since yeah, it just checks if it's >1 to see whether it should keep it or not and due to the reversed order it just gets stripped I think.

(Can't manually set it because it gets called directly from gguf_add_tensor I'm pretty sure`)

int ggml_n_dims(const struct ggml_tensor * tensor) {
    for (int i = GGML_MAX_DIMS - 1; i >= 1; --i) {
        if (tensor->ne[i] > 1) {
            return i + 1;
        }
    }
    return 1;
}

city96 · 2024-10-22T22:09:36Z

Hacky as hell but it works as a proof of concept lol. It's definitely the dimension detection logic. Setting it manually for just that key makes it work.

Edit: wrong code lol:

cchance27 · 2024-10-22T23:39:12Z

@city96 what tag of llama.cpp are you applying th new patch to for some reason mine isn't lining the line numbers correctly

#136

city96 · 2024-10-22T23:41:55Z

@cchance27 I pushed the patch + the actual tag I was targeting. Should be tags/b3962. Currently testing on SD3 medium lol, small models seem to have lower bitrate quants the same way LLMs do.

Apparently SD3.5 also uses a different shape for the keys, which is why K quants don't work. We could use the reshape logic that's also used for SDXL but that needs possible future logic changes.

cchance27 · 2024-10-23T00:13:13Z

i tried a q8_0 and q4_0 and both when it loads i get

Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-GGUF/nodes.py", line 258, in load_unet
    sd = gguf_sd_loader(unet_path)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-GGUF/nodes.py", line 39, in gguf_sd_loader
    reader = gguf.GGUFReader(path)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/gguf/gguf_reader.py", line 130, in __init__
    self._build_tensors(offs, tensors_fields)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/gguf/gguf_reader.py", line 314, in _build_tensors
    data = self._get(data_offs, item_type, item_count).reshape(np_dims),
ValueError: cannot reshape array of size 5914608 into shape (2432,2432)

city96 · 2024-10-23T00:15:10Z

Huh, strange. It works with sd3 medium, but with large it only works with the (old) gguf-py from the llama.cpp repo but not the package one.

cchance27 · 2024-10-23T00:49:04Z

I added a print line into that gguf-py for where its crashing and it seems to be on y_embedder.mlp.2.weight [2432 2432] 5914624, which is odd because... thats a F16 so not even one of the Q8 layers?

also it's weird because it didn't crash on the t_embedder_mpl.2.weight that it processed before that that looks to be same size and np_dims

part of the log i added in gguf-py sitepackage

GGMLQuantizationType.Q8_0 25137152 (2432, 10336) joint_blocks.9.x_block.mlp.fc2.weight
GGMLQuantizationType.F16 89653248 (1, 36864, 2432) pos_embed
GGMLQuantizationType.F16 2432 (2432,) t_embedder.mlp.0.bias
GGMLQuantizationType.F16 622592 (2432, 256) t_embedder.mlp.0.weight
GGMLQuantizationType.F16 2432 (2432,) t_embedder.mlp.2.bias
GGMLQuantizationType.F16 5914624 (2432, 2432) t_embedder.mlp.2.weight
GGMLQuantizationType.F16 2432 (2432,) x_embedder.proj.bias
GGMLQuantizationType.F16 155648 (2432, 16, 2, 2) x_embedder.proj.weight
GGMLQuantizationType.F16 2432 (2432,) y_embedder.mlp.0.bias
GGMLQuantizationType.F16 4980736 (2432, 2048) y_embedder.mlp.0.weight
GGMLQuantizationType.F16 2432 (2432,) y_embedder.mlp.2.bias
GGMLQuantizationType.F16 5914624 (2432, 2432) y_embedder.mlp.2.weight
ERROR:root:!!! Exception during processing !!! cannot reshape array of size 5914608 into shape (2432,2432)

why would the y_embedder crash but not the t_embedder step

city96 · 2024-10-23T01:11:58Z

Okay, I think I got it. I think modifying the metadata after the initial bit where it adds it might be corrupting the data parts, possibly due to the data length changing. That might be because I'm passing it a uint32_t which probably isn't what it should be setting it to, and there's no boundary check there, i.e. it messes up the tensor right after it? Lemme try and fix that lol.

#136

city96 · 2024-10-23T01:17:18Z

@cchance27 Could you re-test with the new patch?

cchance27 · 2024-10-23T01:21:52Z

Looking good, updated it and rebuilt with new patch, just did a Q8_0 and got passed the loader, diffusing now (i'm on mac it takes a bit to diffuse XD)

cchance27 · 2024-10-23T01:24:08Z

Gonna test all the major quants and then upload them to civit in a couple

city96 · 2024-10-23T01:26:54Z

Sounds good, I'll put some up on huggingface in that case and then we should be all covered I think.

cchance27 · 2024-10-23T01:28:03Z

Sounds good thanks for all the help figuring it out, i'd never have tracked down that uint issue lol, too long since i fought with CPP XD

city96 · 2024-10-23T01:33:01Z

I guess the K quants I'll look at separately sometime on the weekend maybe, also depends on how much community adoption it'll get. I think a lot of people prefer the legacy _0 and _1 quants for speed alone lol.

Also, in case you're lazy like me:

name = "sd3.5_large"
path = "~/ComfyUI/models/unet"

for q in ["Q8_0", "Q5_1", "Q5_0", "Q4_1", "Q4_0"]#, "Q6_K", "Q5_K_M", "Q5_K_S", "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q3_K_S", "Q2_K"]:
	print(f"./llama-quantize {path}/{name}-F16.gguf {path}/{name}-{q}.gguf {q}")

city96 · 2024-10-23T02:04:18Z

Here's the basic quants:

doogyhatts · 2024-10-23T06:50:24Z

Got it working on 8gb vram!
Thanks for the quantised models!

Amazon90 · 2024-10-23T07:16:22Z

Here's the basic quants:

Default: https://huggingface.co/city96/stable-diffusion-3.5-large-gguf

Turbo: https://huggingface.co/city96/stable-diffusion-3.5-large-turbo-gguf

Thanks, legend! You’re my hero.

wolfden · 2024-10-23T14:06:20Z

Thanks All! Awesome!

Jonseed · 2024-10-25T04:29:21Z

When loading SD3.5 GGUF in Comfy, do you have to use a regular checkpoint loader on the original safetensors to get the VAE? Or is the VAE included in SD3.5, and we just can't output it yet from the GGUF?

city96 · 2024-10-25T05:48:14Z

@Jonseed You can either download it separately from the SD3 huggingface repo (VAE folder) or use the SaveVAE node on the checkpoint you already have. The gguf files only ever contain the UNET part.

(Also see this comment for more info: #139 (comment) )

city96 added a commit that referenced this issue Oct 22, 2024

Add initial SD3 lcpp patch + instruction

4d7bb93

#136

city96 added a commit that referenced this issue Oct 23, 2024

Fix SD3 patch metadata/tensor issue

b702405

#136

city96 mentioned this issue Oct 31, 2024

Incompatibility with stable-diffusion.cpp for SD3.X models #147

Open

Add GGUF versions for SD3.5 #136

Add GGUF versions for SD3.5 #136

Comments

JorgeR81 commented Oct 22, 2024

wolfden commented Oct 22, 2024 • edited Loading

city96 commented Oct 22, 2024

cchance27 commented Oct 22, 2024

wolfden commented Oct 22, 2024

city96 commented Oct 22, 2024 • edited Loading

cchance27 commented Oct 22, 2024

city96 commented Oct 22, 2024

city96 commented Oct 22, 2024

wolfden commented Oct 22, 2024

JorgeR81 commented Oct 22, 2024 • edited Loading

cchance27 commented Oct 22, 2024

city96 commented Oct 22, 2024

cchance27 commented Oct 22, 2024

city96 commented Oct 22, 2024

cchance27 commented Oct 22, 2024

city96 commented Oct 22, 2024 • edited Loading

city96 commented Oct 22, 2024 • edited Loading

cchance27 commented Oct 22, 2024

city96 commented Oct 22, 2024

cchance27 commented Oct 23, 2024

city96 commented Oct 23, 2024 • edited Loading

cchance27 commented Oct 23, 2024 • edited Loading

city96 commented Oct 23, 2024

city96 commented Oct 23, 2024

cchance27 commented Oct 23, 2024

cchance27 commented Oct 23, 2024

city96 commented Oct 23, 2024

cchance27 commented Oct 23, 2024

city96 commented Oct 23, 2024

city96 commented Oct 23, 2024 • edited Loading

doogyhatts commented Oct 23, 2024

Amazon90 commented Oct 23, 2024

wolfden commented Oct 23, 2024

Jonseed commented Oct 25, 2024

city96 commented Oct 25, 2024

wolfden commented Oct 22, 2024 •

edited

Loading

city96 commented Oct 22, 2024 •

edited

Loading

JorgeR81 commented Oct 22, 2024 •

edited

Loading

city96 commented Oct 22, 2024 •

edited

Loading

city96 commented Oct 22, 2024 •

edited

Loading

city96 commented Oct 23, 2024 •

edited

Loading

cchance27 commented Oct 23, 2024 •

edited

Loading

city96 commented Oct 23, 2024 •

edited

Loading