Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t5xxl_fp8_e4m3fn_scaled.safetensors and t5-v1_1-xxl-encoder-Q8_0.gguf #138

Open
Amazon90 opened this issue Oct 24, 2024 · 3 comments
Open

Comments

@Amazon90
Copy link

Amazon90 commented Oct 24, 2024

What are the differences between these two models? How should I make my choice?In terms of model precision, which one is higher?

https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/blob/main/t5-v1_1-xxl-encoder-Q8_0.gguf

https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp8_e4m3fn_scaled.safetensors

@Kwisss
Copy link

Kwisss commented Oct 24, 2024

I tried both but could not really see a difference in prompt following, but the other to clip encoders do the most heavy lifting I think

@city96
Copy link
Owner

city96 commented Oct 24, 2024

I think the new scaled FP8 ones should be on-par with Q8_0 for T5. The logic is fairly similar from what I can tell but I haven't tested them myself.

The main benefit of FP8 is that it's faster (even more so on 40XX cards with the launch arg) + natively supported in ComfyUI.

I guess the main benefit for GGUF would be lower initial system memory usage on windows due to the way the model is loaded.

@Amazon90
Copy link
Author

Thank you. Although I have a 40XX graphics card, I chose GGUF for its lower RAM usage since I only have 16GB of RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants