feat: add ucache #1066

rmatif · 2025-12-08T12:51:52Z

Adding an experimental variant of an EasyCache-like feature for UNet models. I came up with the name "ucache" (if someone has a better suggestion, I'd take it). For now it uses a step-level skipping mechanism. I want to make it per-block to provide more granularity and control, but the current UNet implementation doesn't allow that for now, and the static nature of ggml graphs makes it difficult to capture precise UNet blocks. I have found the results good enough for now to make this a first iteration

Threshold may vary with the sampler + scheduler combo

./build/bin/sd -m models/model.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --ucache 1,0.15,0.95 --diffusion-fa

20 Steps:

Baseline	5/20	7/20	8/20

0 steps skipped (1x)	5 steps skipped (~1.33x)	7 steps skipped (~1.54x)	8 steps skipped (~1.67x)

30 Steps:

Baseline	10/30	11/30	12/30	14/30	15/30

0 skipped (1x)	10 skipped (1.5x)	11 skipped (~1.58x)	12 skipped (~1.67x)	14 skipped (~1.88x)	15 skipped (2x)

Supersedes #705

wbruna · 2025-12-08T17:45:53Z

This also has a nice side-effect on some low-CFG distilled models: the skipped steps help avoid the "overcooked" effect when using too many steps.

I came up with the name "ucache" (if someone has a better suggestion, I'd take it).

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

rmatif · 2025-12-08T18:48:54Z

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

wbruna · 2025-12-08T22:24:38Z

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning messages.

In reality, Koboldcpp would avoid that anyway by patching stable-diffusion.cpp (each patch adds maintenance overhead, but it's better than the alternative). Command-line users like sd.cpp-webui, or users that don't patch the library, don't have that option.

Even if the cache types used completely different parameters, they have defaults, so a simple flag/checkbox "turn the default cache on" would still be useful, and easy to be supported and used.

And it'd be zero change for the EasyCache implementation: just reuse its parameter struct. The code won't mind a few extra fields, if they are needed.

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

Again, my comment was about unifying the default value from a user's POV (essentially a flat *5 on the input for unet, such as 0.2 is always the default). I agree that doesn't matter much, if you intend to have different defaults depending on model version/sampler/scheduler.

Green-Sky · 2025-12-09T11:17:55Z

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

Regarding a unified command argument, maybe something like --latent-cache or --prediction-cache or something.

leejet · 2025-12-09T14:12:00Z

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

rmatif · 2025-12-09T23:15:04Z

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

Something like that? Or do you want to expose those on the C API?

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning message

That's a valid point, and I think it will be a preferable goal to reach. The fact that the app isn't aware of the model is true for a lot of options here. I think since it's experimental and I will be iterating on it, it's fine to keep it manual for now, and once it's good enough, having only one cache option on sdcpp for every model

rmatif · 2025-12-10T12:38:44Z

Add some tweaks, now it can sometimes accidentally add a nice aesthetic pattern compared to the baseline

20 Steps:

Baseline	5/20	6/20	7/20	8/20	9/20	10/20

0 steps skipped (1x)	5 steps skipped (~1.33x)	6 steps skipped (~1.43x)	7 steps skipped (~1.54x)	8 steps skipped (~1.67x)	9 steps skipped (~1.82x)	10 steps skipped (2x)

30 Steps:

Baseline	8/30	10/30	12/30	13/30	14/30	15/30	16/30

0 steps skipped (1x)	8 steps skipped (~1.36x)	10 steps skipped (1.5x)	12 steps skipped (~1.67x)	13 steps skipped (~1.76x)	14 steps skipped (~1.88x)	15 steps skipped (2x)	16 steps skipped (~2.14x)

Green-Sky · 2025-12-10T13:50:09Z

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

I was using --cfg-scale 8 --steps 36 --scheduler karras --sampling-method dpm++2m with cyberrealisticxl.

Will test the new code later and play some more with the params.

leejet · 2025-12-10T14:08:12Z

Something like that? Or do you want to expose those on the C API?

The command arguments are just as I expected.
By the way, maybe we can put all the cache-related parameters into a single struct in the API?
Then when we add more cache methods later, we won’t need a separate struct for each one — and it seems like multiple cache methods can’t be active at the same time anyway.

rmatif added 2 commits December 9, 2025 23:02

add ucache

24304f9

add cache-mode and cache-option

c8cc665

rmatif force-pushed the add-ucache branch from 4ceff8d to c8cc665 Compare December 9, 2025 23:07

add decay rate and relative threshold

86d176b

rmatif added 2 commits December 10, 2025 15:07

use single unified struct

bbe73be

use actual scheduler sigmas for ucache bounds

7c3dd7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add ucache #1066

feat: add ucache #1066

rmatif commented Dec 8, 2025

Uh oh!

wbruna commented Dec 8, 2025

Uh oh!

rmatif commented Dec 8, 2025

Uh oh!

wbruna commented Dec 8, 2025

Uh oh!

Green-Sky commented Dec 9, 2025

Uh oh!

leejet commented Dec 9, 2025

Uh oh!

rmatif commented Dec 9, 2025

Uh oh!

rmatif commented Dec 10, 2025

Uh oh!

Green-Sky commented Dec 10, 2025

Uh oh!

leejet commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add ucache #1066

Are you sure you want to change the base?

feat: add ucache #1066

Conversation

rmatif commented Dec 8, 2025

Uh oh!

wbruna commented Dec 8, 2025

Uh oh!

rmatif commented Dec 8, 2025

Uh oh!

wbruna commented Dec 8, 2025

Uh oh!

Green-Sky commented Dec 9, 2025

Uh oh!

leejet commented Dec 9, 2025

Uh oh!

rmatif commented Dec 9, 2025

Uh oh!

rmatif commented Dec 10, 2025

Uh oh!

Green-Sky commented Dec 10, 2025

Uh oh!

leejet commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants