Skip to content

Conversation

@rmatif
Copy link
Contributor

@rmatif rmatif commented Dec 8, 2025

Adding an experimental variant of an EasyCache-like feature for UNet models. I came up with the name "ucache" (if someone has a better suggestion, I'd take it). For now it uses a step-level skipping mechanism. I want to make it per-block to provide more granularity and control, but the current UNet implementation doesn't allow that for now, and the static nature of ggml graphs makes it difficult to capture precise UNet blocks. I have found the results good enough for now to make this a first iteration

Threshold may vary with the sampler + scheduler combo

./build/bin/sd -m models/model.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --ucache 1,0.15,0.95 --diffusion-fa

20 Steps:

Baseline 5/20 7/20 8/20
baseline 5_20 7_20 8_20
0 steps skipped (1x) 5 steps skipped (~1.33x) 7 steps skipped (~1.54x) 8 steps skipped (~1.67x)

30 Steps:

Baseline 10/30 11/30 12/30 14/30 15/30
baseline 10_30 11_30 12_30 14_30 15_30
0 skipped (1x) 10 skipped (1.5x) 11 skipped (~1.58x) 12 skipped (~1.67x) 14 skipped (~1.88x) 15 skipped (2x)

Supersedes #705

@wbruna
Copy link
Contributor

wbruna commented Dec 8, 2025

This also has a nice side-effect on some low-CFG distilled models: the skipped steps help avoid the "overcooked" effect when using too many steps.

I came up with the name "ucache" (if someone has a better suggestion, I'd take it).

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

@rmatif
Copy link
Contributor Author

rmatif commented Dec 8, 2025

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

@wbruna
Copy link
Contributor

wbruna commented Dec 8, 2025

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning messages.

In reality, Koboldcpp would avoid that anyway by patching stable-diffusion.cpp (each patch adds maintenance overhead, but it's better than the alternative). Command-line users like sd.cpp-webui, or users that don't patch the library, don't have that option.

Even if the cache types used completely different parameters, they have defaults, so a simple flag/checkbox "turn the default cache on" would still be useful, and easy to be supported and used.

And it'd be zero change for the EasyCache implementation: just reuse its parameter struct. The code won't mind a few extra fields, if they are needed.

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

Again, my comment was about unifying the default value from a user's POV (essentially a flat *5 on the input for unet, such as 0.2 is always the default). I agree that doesn't matter much, if you intend to have different defaults depending on model version/sampler/scheduler.

@Green-Sky
Copy link
Contributor

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.


Regarding a unified command argument, maybe something like --latent-cache or --prediction-cache or something.

@leejet
Copy link
Owner

leejet commented Dec 9, 2025

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

@rmatif
Copy link
Contributor Author

rmatif commented Dec 9, 2025

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

Something like that? Or do you want to expose those on the C API?

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning message

That's a valid point, and I think it will be a preferable goal to reach. The fact that the app isn't aware of the model is true for a lot of options here. I think since it's experimental and I will be iterating on it, it's fine to keep it manual for now, and once it's good enough, having only one cache option on sdcpp for every model

@rmatif
Copy link
Contributor Author

rmatif commented Dec 10, 2025

Add some tweaks, now it can sometimes accidentally add a nice aesthetic pattern compared to the baseline

20 Steps:

Baseline 5/20 6/20 7/20 8/20 9/20 10/20
baseline_20 5_20 6_20 7_20 8_20 9_20 10_20
0 steps skipped (1x) 5 steps skipped (~1.33x) 6 steps skipped (~1.43x) 7 steps skipped (~1.54x) 8 steps skipped (~1.67x) 9 steps skipped (~1.82x) 10 steps skipped (2x)

30 Steps:

Baseline 8/30 10/30 12/30 13/30 14/30 15/30 16/30
baseline_30 8_30 10_30 12_30 13_30 14_30 15_30 16_30
0 steps skipped (1x) 8 steps skipped (~1.36x) 10 steps skipped (1.5x) 12 steps skipped (~1.67x) 13 steps skipped (~1.76x) 14 steps skipped (~1.88x) 15 steps skipped (2x) 16 steps skipped (~2.14x)

@Green-Sky
Copy link
Contributor

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

I was using --cfg-scale 8 --steps 36 --scheduler karras --sampling-method dpm++2m with cyberrealisticxl.

Will test the new code later and play some more with the params.

@leejet
Copy link
Owner

leejet commented Dec 10, 2025

Something like that? Or do you want to expose those on the C API?

The command arguments are just as I expected.
By the way, maybe we can put all the cache-related parameters into a single struct in the API?
Then when we add more cache methods later, we won’t need a separate struct for each one — and it seems like multiple cache methods can’t be active at the same time anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants