`t5xxl_fp8_e4m3fn_scaled.safetensors` and `t5-v1_1-xxl-encoder-Q8_0.gguf` #138

Amazon90 · 2024-10-24T06:28:07Z

What are the differences between these two models? How should I make my choice?In terms of model precision, which one is higher?

Kwisss · 2024-10-24T10:11:16Z

I tried both but could not really see a difference in prompt following, but the other to clip encoders do the most heavy lifting I think

city96 · 2024-10-24T17:52:34Z

I think the new scaled FP8 ones should be on-par with Q8_0 for T5. The logic is fairly similar from what I can tell but I haven't tested them myself.

The main benefit of FP8 is that it's faster (even more so on 40XX cards with the launch arg) + natively supported in ComfyUI.

I guess the main benefit for GGUF would be lower initial system memory usage on windows due to the way the model is loaded.

Amazon90 · 2024-10-25T07:12:10Z

Thank you. Although I have a 40XX graphics card, I chose GGUF for its lower RAM usage since I only have 16GB of RAM.

Provide feedback