-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The settings start = 0.2
, end = 0.8
, max_consecutive_cache_hits = 5
provide better results.
#87
Comments
Interesting, thanks for sharing that! Was this 30 steps with normal Euler? The schedule also probably makes a difference. Also curious if you did any testing with different |
Interesting i will start testing different values now to find the perfect ones |
@blepping, @brahianrosswill Thank you for your replies. I conducted my measurements under the following conditions:
Regarding the relationship between the sampler and the scheduler, I haven’t directly investigated it myself. However, @Shiba-2-shiba has published their verification results using Hunyunvideo on a Japanese blog. According to their findings, the samplers Please consider using a translation tool to check out their blog. I’ll continue to look into the optimal value for |
One thing I found through my own experimentation (which is probably pretty obvious) is that ancestral/SDE samplers mostly prevent the cache from activating. Makes sense that throwing random noise into the generation at each step is going to force the model to do something different, and when it does something different it can't be cached. I made a cache aware version of Euler ancestral: https://gist.github.com/blepping/ec48891459afc3e9c30e5f94b0fcdb42 - it tries to defer ancestral steps while the model is returning similar results. I don't know if it's actually good, but it allows caching much more often than normal Euler ancestral while allowing at least some of the benefits of ancestral sampling. |
Additional VerificationI conducted tests by varying the Graph 3 start
Graph 4 end
Graph 5 max_consecutive_cache_hits
Graph 6 No Dynamic Caching (Steps)Additionally, I examined the case where Dynamic Caching was not used, and the steps were varied. Observations
The initial settings I tested (
The actual test data can be found on this page. |
One thing I noticed from testing SDLX with WaveSpeed is that even prompt can affect the generation speed. I think the core issue is that some models/generations can be unstable. Like model switches back and forth how to arrange image (sometimes leaving multiple pairs of arms/legs). The more unstable/chaotic the generation is, the worse WaveSpeed performance and quality becomes. If you want, you can make a graph of progressively more details added to the same prompt. The more information there is to put into the image, the more model needs to work and adjust on every step. For simple prompt the "vision" of end result is clear for the model and it is able to use cache without much degradation. Probably a bad lora can also cause model instability and degrade performance/quality as many loaras are undertrained/overcooked etc. |
@MiczuThank you for your comment. Here is additional information about the environment I used for my tests:
Although results might differ between Flux.1 and SDXL, the same text encoder and prompt were used, meaning the conditioning was cached (Stable Cache, not Dynamic). Therefore, I believe its influence on the results is minimal. That said, your observation about image variations is valid. In this test, I used 30 steps, which might have been excessive. The actual image may have already been completed around 20 steps, with no further impact on image quality but minor differences affecting MAE and SSIM. Additionally, I used MAE and SSIM as metrics for image quality, but their limitations have also become apparent. Here are two images, both generated in half the time compared to the original: RDT = 0.08 (102 sec)Start = 0.3 (108 sec)The second image, with only the Finally, I observed another interesting phenomenon: No Dynamic Caching 15 steps (106 sec)No Dynamic Caching 15 steps + Dynamic Caching 15 steps (Total 30 steps, 147 sec)The illustration with an additional 15 steps of Dynamic Caching shows a more solid overall structure but with simplified details. I suspect this may be due to minor errors introduced during inference with Dynamic Caching, even though the larger calculations remain consistent. The second illustration might sometimes be preferred for its simplicity and persuasiveness. Overall, while I still believe The data is summarized in this blog post. |
I tested the effect of Dynamic Caching using the Flux.1-dev Anime Model.
Graph 1
start = 0
end = 1
max_consecutive_cache_hits = -1
Graph 2
start = 0.2
end = 0.8
max_consecutive_cache_hits = 5
The small numbers in the graphs represent the
residual_diff_threshold
.The vertical axis shows similarity based on MAE (Mean Absolute Error) and SSIM (Structural Similarity Index), where lower values indicate greater image degradation.
As shown in the graphs, the settings
start = 0.2
,end = 0.8
,max_consecutive_cache_hits = 5
achieve a better balance between speed and image quality.This verification focused solely on Dynamic Caching and did not utilize Enhanced
torch.compile
.The measurement data and actual images are published on my blog.
Thank you for sharing this amazing feature!
The text was updated successfully, but these errors were encountered: