The settings `start = 0.2`, `end = 0.8`, `max_consecutive_cache_hits = 5` provide better results. #87

easygoing0114 · 2025-01-19T01:42:32Z

I tested the effect of Dynamic Caching using the Flux.1-dev Anime Model.

Graph 1

start = 0
end = 1
max_consecutive_cache_hits = -1

Graph comparing RDT and generation time and similarity between MAE and SSIM in WaveSpeed_start_0_end_1_max_consecutive_cache_hits_-1

Graph 2

start = 0.2
end = 0.8
max_consecutive_cache_hits = 5

Adjust scale graph comparing RDT and generation time and similarity between MAE and SSIM in WaveSpeed_start_0.2_end_0.8_max_consecutive_cache_hits_5 adjust scale

The small numbers in the graphs represent the residual_diff_threshold.

The vertical axis shows similarity based on MAE (Mean Absolute Error) and SSIM (Structural Similarity Index), where lower values indicate greater image degradation.

As shown in the graphs, the settings start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 achieve a better balance between speed and image quality.

This verification focused solely on Dynamic Caching and did not utilize Enhanced torch.compile.

The measurement data and actual images are published on my blog.

Thank you for sharing this amazing feature!

The text was updated successfully, but these errors were encountered:

blepping · 2025-01-19T08:42:05Z

Interesting, thanks for sharing that! Was this 30 steps with normal Euler? The schedule also probably makes a difference.

Also curious if you did any testing with different max_consecutive_cache_hits values, I use 1 or 2 myself. Probably would make the most sense to test values below 5, I doubt 6+ would be much different than unlimited.

brahianrosswill · 2025-01-19T16:49:31Z

Interesting i will start testing different values now to find the perfect ones

easygoing0114 · 2025-01-20T03:55:15Z

@blepping, @brahianrosswill Thank you for your replies.

I conducted my measurements under the following conditions:

euler
normal
30 steps
1440 x 1440

Regarding the relationship between the sampler and the scheduler, I haven’t directly investigated it myself. However, @Shiba-2-shiba has published their verification results using Hunyunvideo on a Japanese blog.

According to their findings, the samplers DDIM and euler, and the schedulers Karras and Exponential, appear to be more effective for Dynamic Caching.

Please consider using a translation tool to check out their blog.

I’ll continue to look into the optimal value for max_consecutive_cache_hits.

blepping · 2025-01-20T04:22:54Z

One thing I found through my own experimentation (which is probably pretty obvious) is that ancestral/SDE samplers mostly prevent the cache from activating. Makes sense that throwing random noise into the generation at each step is going to force the model to do something different, and when it does something different it can't be cached.

I made a cache aware version of Euler ancestral: https://gist.github.com/blepping/ec48891459afc3e9c30e5f94b0fcdb42 - it tries to defer ancestral steps while the model is returning similar results. I don't know if it's actually good, but it allows caching much more often than normal Euler ancestral while allowing at least some of the benefits of ancestral sampling.

easygoing0114 · 2025-01-20T18:03:20Z

Additional Verification

@blepping

I conducted tests by varying the start, end, and max_consecutive_cache_hits parameters.

Graph 3 start

RDT = 1
end = 1
max hits = -1

Dynamic Cashing start (RDT=1, end=1, max hits=-1)

Graph 4 end

RDT = 1
start = 0
max hits = -1

Dynamic Cashing end (RDT=1, start=0, max hits=-1)

Graph 5 max_consecutive_cache_hits

RDT = 1
start = 0
end = 1

Dynamic Cashing max hits (RDT=1, start=0, end=1)

Graph 6 No Dynamic Caching (Steps)

Additionally, I examined the case where Dynamic Caching was not used, and the steps were varied.

Observations

Proper use of Dynamic Caching allows for faster image generation while maintaining quality, compared to simply reducing the number of steps.
The start parameter is the most important and should not be set to 0. A value around 0.3 seems to provide a good balance between image quality and speed.
~~Parameters such as residual_diff_threshold, end, and max_consecutive_cache_hits appear to have less direct impact on image quality.~~

The initial settings I tested (start = 0.2, end = 0.8, max_consecutive_cache_hits = 5) seem to offer a reasonable balance between quality and speed.

When controlling Dynamic Caching, it might be more effective to use a different metric that captures changes in the early stages of image generation, rather than relying on residual_diff_threshold.

The actual test data can be found on this page.

Miczu · 2025-01-22T16:44:37Z

One thing I noticed from testing SDLX with WaveSpeed is that even prompt can affect the generation speed. I think the core issue is that some models/generations can be unstable. Like model switches back and forth how to arrange image (sometimes leaving multiple pairs of arms/legs). The more unstable/chaotic the generation is, the worse WaveSpeed performance and quality becomes.

If you want, you can make a graph of progressively more details added to the same prompt. The more information there is to put into the image, the more model needs to work and adjust on every step. For simple prompt the "vision" of end result is clear for the model and it is able to use cache without much degradation.

Probably a bad lora can also cause model instability and degrade performance/quality as many loaras are undertrained/overcooked etc.

easygoing0114 · 2025-01-23T03:50:15Z

@Miczu

Thank you for your comment.

Here is additional information about the environment I used for my tests:

Model: Custom anime model of Flux.1
LoRA: Not used
Prompt: A simple prompt
- anime, night, linear super express, train, lead car, dutch angle, close up, ultra high speed, motion blur

Although results might differ between Flux.1 and SDXL, the same text encoder and prompt were used, meaning the conditioning was cached (Stable Cache, not Dynamic). Therefore, I believe its influence on the results is minimal.

That said, your observation about image variations is valid. In this test, I used 30 steps, which might have been excessive. The actual image may have already been completed around 20 steps, with no further impact on image quality but minor differences affecting MAE and SSIM.

Additionally, I used MAE and SSIM as metrics for image quality, but their limitations have also become apparent.

Here are two images, both generated in half the time compared to the original:

RDT = 0.08 (102 sec)

residual_diff_threshold_0.08_start_0_end_1_diff

Start = 0.3 (108 sec)

The second image, with only the start parameter adjusted, shows a more stable overall structure and composition, but with less detailed rendering. Since MAE and SSIM are influenced by larger structures, the results for the start setting appear very strong. My findings may have leaned heavily towards the influence of start.

Finally, I observed another interesting phenomenon:

No Dynamic Caching 15 steps (106 sec)

No Dynamic Caching 15 steps + Dynamic Caching 15 steps (Total 30 steps, 147 sec)

The illustration with an additional 15 steps of Dynamic Caching shows a more solid overall structure but with simplified details.

I suspect this may be due to minor errors introduced during inference with Dynamic Caching, even though the larger calculations remain consistent.

The second illustration might sometimes be preferred for its simplicity and persuasiveness.

Overall, while I still believe start is a critical parameter, I now think that other settings should also be adjusted appropriately.

The data is summarized in this blog post.

blepping mentioned this issue Jan 21, 2025

SDXL. Speedup on default workflow, no speedup if you use lora. #93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The settings `start = 0.2`, `end = 0.8`, `max_consecutive_cache_hits = 5` provide better results. #87

The settings `start = 0.2`, `end = 0.8`, `max_consecutive_cache_hits = 5` provide better results. #87

easygoing0114 commented Jan 19, 2025

blepping commented Jan 19, 2025

brahianrosswill commented Jan 19, 2025

easygoing0114 commented Jan 20, 2025

blepping commented Jan 20, 2025

easygoing0114 commented Jan 20, 2025 •

edited

Loading

Miczu commented Jan 22, 2025 •

edited

Loading

easygoing0114 commented Jan 23, 2025 •

edited

Loading

The settings start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 provide better results. #87

The settings start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 provide better results. #87

Comments

easygoing0114 commented Jan 19, 2025

Graph 1

Graph 2

blepping commented Jan 19, 2025

brahianrosswill commented Jan 19, 2025

easygoing0114 commented Jan 20, 2025

blepping commented Jan 20, 2025

easygoing0114 commented Jan 20, 2025 • edited Loading

Additional Verification

Graph 3 start

Graph 4 end

Graph 5 max_consecutive_cache_hits

Graph 6 No Dynamic Caching (Steps)

Observations

Miczu commented Jan 22, 2025 • edited Loading

easygoing0114 commented Jan 23, 2025 • edited Loading

@Miczu

RDT = 0.08 (102 sec)

Start = 0.3 (108 sec)

No Dynamic Caching 15 steps (106 sec)

No Dynamic Caching 15 steps + Dynamic Caching 15 steps (Total 30 steps, 147 sec)

The settings `start = 0.2`, `end = 0.8`, `max_consecutive_cache_hits = 5` provide better results. #87

The settings `start = 0.2`, `end = 0.8`, `max_consecutive_cache_hits = 5` provide better results. #87

easygoing0114 commented Jan 20, 2025 •

edited

Loading

Miczu commented Jan 22, 2025 •

edited

Loading

easygoing0114 commented Jan 23, 2025 •

edited

Loading