Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The node is having no effect in the example workflow for SDXL #62

Open
danganbenpa opened this issue Jan 12, 2025 · 12 comments
Open

The node is having no effect in the example workflow for SDXL #62

danganbenpa opened this issue Jan 12, 2025 · 12 comments

Comments

@danganbenpa
Copy link

The node is having no effect in the example workflow for SDXL. There are no errors either. It is working properly with Flux in the Flux workflow though.

@chengzeyi
Copy link
Owner

It can speed up sdxl but is not as significant as flux! You can try enlarging the value of residual_diff_threshold!

@danganbenpa
Copy link
Author

I'm finding that whatever I set residual_diff_threshold to I get exactly the same output at exactly the same quality in exactly the same generation time.

@feffy380
Copy link

@danganbenpa Try a different sampler. I don't know why, but I get no speedup with dpmpp_2m_sde in the example workflow, but dpmpp_2m works fine

@Miczu
Copy link

Miczu commented Jan 19, 2025

I have the same issue, it did speed up with dpmpp_2m sampler

@Miczu
Copy link

Miczu commented Jan 19, 2025

Did testing of all samples, here are the results. I used "residual_diff_threshold" of 0.73 to speedup testing. With this unusable setting, I noticed that some samplers just partially react to the node. Maybe they have very noisy residuals???

Good (big skips):
euler
euler_cfg_pp
euler_ancestral
euler_ancestral_cfg_pp
dpm_2
lms
dpm_fast
dpm_adaptive
dpmpp_2m
dpmpp_2m_cfg_pp
ipndm
ipndm_v
deis
res_multistep
res_multistep_cfg_pp
ddim
uni_pc
uni_pc_bh2

Somewhat bad (medium skips at extreme threshold)
dpmpp_2m_sde
dpmpp_2m_sde_gpu
dpmpp_3m_sde(medium skips at 0.73, no skips at 0.200)
dpmpp_3m_sde_gpu
ddpm

Bad (minimal skips at extreme threshold)
dpmpp_sde
dpmpp_sde_gpu

Very Bad (no skips):
heun
heunpp2
dpm_2_ancestral
dpmpp_2s_ancestral
dpmpp_2s_ancestral_cfg_pp
lcm

@Miczu
Copy link

Miczu commented Jan 19, 2025

For reference, SDXL 100 steps euler with 0.2 threshold takes 8 seconds. Bypassed node 65 seconds.
For reference, SDXL 30 steps euler with 0.2 threshold takes 6 seconds. Bypassed node 19 seconds.

So I would say SDXL can have big gains from WaveSpeed, just need to pick the right sampler or there is something that makes some samplers behave weirdly.

@chengzeyi
Copy link
Owner

For reference, SDXL 100 steps euler with 0.2 threshold takes 8 seconds. Bypassed node 65 seconds. For reference, SDXL 30 steps euler with 0.2 threshold takes 6 seconds. Bypassed node 19 seconds.

So I would say SDXL can have big gains from WaveSpeed, just need to pick the right sampler or there is something that makes some samplers behave weirdly.

I didn’t know that different samplers could have so much differences before. I want to know which samplers that people use often with SDXL and can benefit significantly from caching so that I can update in the example workflow.

@Miczu
Copy link

Miczu commented Jan 19, 2025

So proper test with 30 steps, semi complex prompt and threshold of 0.2

Sampler time
euler 6s
euler_cfg_pp 7s
euler_ancestral 20s
euler_ancestral_cfg_pp 20s
dpm_2 7s
dpm_2_ancestral 21s
lms 6s
dpm_fast 6s
dpm_adaptive 24s
dpmpp_2m 7s
dpmpp_2m_cfg_pp 8s
ipndm 6s
ipndm_v 6s
deis 7s
res_multistep 7s
res_multistep_cfg_pp 8s
ddim 6s
uni_pc 6s
uni_pc_bh2 6s
dpmpp_2m_sde 21s
dpmpp_3m_sde 21s
dpmpp_sde 33s
heun 20s
dpm_2_ancestral 20s
dpmpp_2s_ancestral 20s
lcm 20s

Maybe I'm sloppy with testing. Some samplers I assumed were fast ended up being slow in this round of more realistic test.

Additionally, I tested this with updated comfyui. I have older one too and that one is slow with all samplers with the same workflow. So definitively updating comfyui should be a recommendation.

@feffy380
Copy link

feffy380 commented Jan 20, 2025

Certain samplers do multiple model calls, which could lead to the cache being invalidated on every step. If that's the cause, this might be fixed by #83

@chengzeyi
Copy link
Owner

@Miczu I have tested some of the samplers with 0.2 threshold and found that the quality changed significantly. Do you have same issues?

@danganbenpa
Copy link
Author

It doesn't make any difference which sampler I select or how many steps or what I set residual_diff_threshold to, the outputs are always identical whether the node is on or off.

@blepping
Copy link
Contributor

@feffy380

Certain samplers do multiple model calls, which could lead to the cache being invalidated on every step. If that's the cause, this might be fixed by #83

Usually multi-step samplers shouldn't be an issue because they won't be calling the model on the same timestep. Most of the time they'll be calling it on descending timesteps which works perfectly with FBCache. (Adaptive ODE solvers actually seem to work quite well.)

There's a pathalogical case where a multistep sampler might be jumping around in the sigmas but I don't think that really happens much (ever?) in practice.


@danganbenpa

It doesn't make any difference which sampler I select or how many steps or what I set residual_diff_threshold to, the outputs are always identical whether the node is on or off.

Are you using any model patches like PAG/SAG? Ancestral/SDE samplers can also basically disable FBCache from activating.


@Miczu

Maybe I'm sloppy with testing. Some samplers I assumed were fast ended up being slow in this round of more realistic test.

Seems like it's mostly ancestral/SDE samplers that are slow. Those samplers inject new noise at every step, which pushes the model toward less similar results which causes caching to not activate. I made a cache-aware variant of Euler ancestral, you could possibly try that if you want (some) ancestralness without hurting the ability to cache as much: https://gist.github.com/blepping/ec48891459afc3e9c30e5f94b0fcdb42 (note that this is very experimental, also my own personal thing not endorsed by the WaveSpeed project or anything like that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants