Speed and optimisation #24

Njaecha · 2023-02-23T19:59:16Z

Njaecha
Feb 23, 2023

I'm starting this as Q&A because its mainly a question I've been wondering about:
Why is there such big speed differences when generating between ComfyUI, Automatic1111 and other solutions? And why is it so different for each GPU?

A friend of mine for example is doing this on a GTX 960 (what a madman) and he's experiencing up to 3 times the speed when doing inference in ComfyUI over Automatic's.
I, on the other hand, are on a RTX 3090 TI and inference for me is 4 to 6 times slower than in Automatic's. I'm using xformers on both systems and were testing with the same settings.

And then in addition, complicated workflows with more advanced conditionings get even slower (which is to be expected).
I'm wondering if other people experience similar things and if anyone has ideas on why it behaves like that. And how we can make it better (on strong cards).

Answered by comfyanonymous

Mar 25, 2023

I think this can be closed because you can just use the CheckpointLoaderSimple now to get optimal performance.

View full answer

comfyanonymous · 2023-02-23T20:15:14Z

comfyanonymous
Feb 23, 2023
Maintainer

Try using an fp16 model config in the CheckpointLoader node. That should speed things up a bit on newer cards.

It should be at least as fast as the a1111 ui if you do that. If it isn't let me know because it's something I need to fix.

0 replies

PospoSK · 2023-03-15T11:10:21Z

PospoSK
Mar 15, 2023

Found this fix for Automatic1111 and it works for ComfyUI as well.
https://www.reddit.com/r/StableDiffusion/comments/y71q5k/4090_cudnn_performancespeed_fix_automatic1111/

I have RTX 4070Ti a had this gains with update to cudnn8800 (Model ChilloutMix, 512x512, 20 steps, CFG 8, Euler):
Batch 1 ... 8.5it/s to 17.6it/s
Batch 4 ... 2.6it/s to 4.5it/s (10.4it/s to 18it/s)
Performance wise are both equal now. And I am happy :)

0 replies

comfyanonymous · 2023-03-25T19:55:20Z

comfyanonymous
Mar 25, 2023
Maintainer

I think this can be closed because you can just use the CheckpointLoaderSimple now to get optimal performance.

2 replies

cobrahnaf Oct 17, 2023

It took me like 1.5 hours to generate this image, and yeah the image is not what I expected it to be, it's so blurry.

Femmyscout Sep 25, 2024

cfg 100.....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed and optimisation #24

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Speed and optimisation #24

Njaecha Feb 23, 2023

Replies: 3 comments · 2 replies

comfyanonymous Feb 23, 2023 Maintainer

PospoSK Mar 15, 2023

comfyanonymous Mar 25, 2023 Maintainer

cobrahnaf Oct 17, 2023

Femmyscout Sep 25, 2024

Njaecha
Feb 23, 2023

Replies: 3 comments 2 replies

comfyanonymous
Feb 23, 2023
Maintainer

PospoSK
Mar 15, 2023

comfyanonymous
Mar 25, 2023
Maintainer