Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The settings start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 provide better results. #87

Open
easygoing0114 opened this issue Jan 19, 2025 · 7 comments

Comments

@easygoing0114
Copy link

I tested the effect of Dynamic Caching using the Flux.1-dev Anime Model.

Graph 1

  • start = 0
  • end = 1
  • max_consecutive_cache_hits = -1
Graph comparing RDT and generation time and similarity between MAE and SSIM in WaveSpeed_start_0_end_1_max_consecutive_cache_hits_-1

Graph 2

  • start = 0.2
  • end = 0.8
  • max_consecutive_cache_hits = 5
Adjust scale graph comparing RDT and generation time and similarity between MAE and SSIM in WaveSpeed_start_0.2_end_0.8_max_consecutive_cache_hits_5 adjust scale

The small numbers in the graphs represent the residual_diff_threshold.

The vertical axis shows similarity based on MAE (Mean Absolute Error) and SSIM (Structural Similarity Index), where lower values indicate greater image degradation.

As shown in the graphs, the settings start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 achieve a better balance between speed and image quality.

This verification focused solely on Dynamic Caching and did not utilize Enhanced torch.compile.

The measurement data and actual images are published on my blog.

Thank you for sharing this amazing feature!

@blepping
Copy link
Contributor

Interesting, thanks for sharing that! Was this 30 steps with normal Euler? The schedule also probably makes a difference.

Also curious if you did any testing with different max_consecutive_cache_hits values, I use 1 or 2 myself. Probably would make the most sense to test values below 5, I doubt 6+ would be much different than unlimited.

@brahianrosswill
Copy link

Interesting i will start testing different values now to find the perfect ones

@easygoing0114
Copy link
Author

@blepping, @brahianrosswill Thank you for your replies.

I conducted my measurements under the following conditions:

  • euler
  • normal
  • 30 steps
  • 1440 x 1440

Regarding the relationship between the sampler and the scheduler, I haven’t directly investigated it myself. However, @Shiba-2-shiba has published their verification results using Hunyunvideo on a Japanese blog.

According to their findings, the samplers DDIM and euler, and the schedulers Karras and Exponential, appear to be more effective for Dynamic Caching.

Please consider using a translation tool to check out their blog.

I’ll continue to look into the optimal value for max_consecutive_cache_hits.

@blepping
Copy link
Contributor

One thing I found through my own experimentation (which is probably pretty obvious) is that ancestral/SDE samplers mostly prevent the cache from activating. Makes sense that throwing random noise into the generation at each step is going to force the model to do something different, and when it does something different it can't be cached.

I made a cache aware version of Euler ancestral: https://gist.github.com/blepping/ec48891459afc3e9c30e5f94b0fcdb42 - it tries to defer ancestral steps while the model is returning similar results. I don't know if it's actually good, but it allows caching much more often than normal Euler ancestral while allowing at least some of the benefits of ancestral sampling.

@easygoing0114
Copy link
Author

easygoing0114 commented Jan 20, 2025

Additional Verification

@blepping

I conducted tests by varying the start, end, and max_consecutive_cache_hits parameters.

Graph 3 start

  • RDT = 1
  • end = 1
  • max hits = -1
Dynamic Cashing start (RDT=1, end=1, max hits=-1)

Graph 4 end

  • RDT = 1
  • start = 0
  • max hits = -1
Dynamic Cashing end (RDT=1, start=0, max hits=-1)

Graph 5 max_consecutive_cache_hits

  • RDT = 1
  • start = 0
  • end = 1
Dynamic Cashing max hits (RDT=1, start=0, end=1)

Graph 6 No Dynamic Caching (Steps)

Additionally, I examined the case where Dynamic Caching was not used, and the steps were varied.

No Dynamic Cashing Steps

Observations

  • Proper use of Dynamic Caching allows for faster image generation while maintaining quality, compared to simply reducing the number of steps.
  • The start parameter is the most important and should not be set to 0. A value around 0.3 seems to provide a good balance between image quality and speed.
  • Parameters such as residual_diff_threshold, end, and max_consecutive_cache_hits appear to have less direct impact on image quality.

The initial settings I tested (start = 0.2, end = 0.8, max_consecutive_cache_hits = 5) seem to offer a reasonable balance between quality and speed.

When controlling Dynamic Caching, it might be more effective to use a different metric that captures changes in the early stages of image generation, rather than relying on residual_diff_threshold.

The actual test data can be found on this page.

@Miczu
Copy link

Miczu commented Jan 22, 2025

One thing I noticed from testing SDLX with WaveSpeed is that even prompt can affect the generation speed. I think the core issue is that some models/generations can be unstable. Like model switches back and forth how to arrange image (sometimes leaving multiple pairs of arms/legs). The more unstable/chaotic the generation is, the worse WaveSpeed performance and quality becomes.

If you want, you can make a graph of progressively more details added to the same prompt. The more information there is to put into the image, the more model needs to work and adjust on every step. For simple prompt the "vision" of end result is clear for the model and it is able to use cache without much degradation.

Probably a bad lora can also cause model instability and degrade performance/quality as many loaras are undertrained/overcooked etc.

@easygoing0114
Copy link
Author

easygoing0114 commented Jan 23, 2025

@Miczu

Thank you for your comment.

Here is additional information about the environment I used for my tests:

  • Model: Custom anime model of Flux.1
  • LoRA: Not used
  • Prompt: A simple prompt
    • anime, night, linear super express, train, lead car, dutch angle, close up, ultra high speed, motion blur

Although results might differ between Flux.1 and SDXL, the same text encoder and prompt were used, meaning the conditioning was cached (Stable Cache, not Dynamic). Therefore, I believe its influence on the results is minimal.

That said, your observation about image variations is valid. In this test, I used 30 steps, which might have been excessive. The actual image may have already been completed around 20 steps, with no further impact on image quality but minor differences affecting MAE and SSIM.

Additionally, I used MAE and SSIM as metrics for image quality, but their limitations have also become apparent.

Here are two images, both generated in half the time compared to the original:

RDT = 0.08 (102 sec)

residual_diff_threshold_0.08_start_0_end_1_diff

Start = 0.3 (108 sec)

RDT1_start_0.3_end_1_max_hits_-1_diff

The second image, with only the start parameter adjusted, shows a more stable overall structure and composition, but with less detailed rendering. Since MAE and SSIM are influenced by larger structures, the results for the start setting appear very strong. My findings may have leaned heavily towards the influence of start.

Finally, I observed another interesting phenomenon:

No Dynamic Caching 15 steps (106 sec)

No_Dynamic_Cash_step_15_diff

No Dynamic Caching 15 steps + Dynamic Caching 15 steps (Total 30 steps, 147 sec)

RDT1_start_0.5_end_1_max_hits_-1_diff

The illustration with an additional 15 steps of Dynamic Caching shows a more solid overall structure but with simplified details.

I suspect this may be due to minor errors introduced during inference with Dynamic Caching, even though the larger calculations remain consistent.

The second illustration might sometimes be preferred for its simplicity and persuasiveness.

Overall, while I still believe start is a critical parameter, I now think that other settings should also be adjusted appropriately.

The data is summarized in this blog post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants