Model Priority & Model Pinning for Multi-Model / Router Mode #17979

bluemoehre · 2025-12-12T23:17:45Z

bluemoehre
Dec 12, 2025

I'm currently setting up a small "home assistant / work support landscape" for myself:
A primary intelligence (gpt‑oss‑120b) and a coding assistant (Qwen‑3 Coder) that I use in VS Code all day.

But my new AI server still has VRAM left (and it was expensive 😉). So I want to make the most of it by adding occasional services such as TTS or TTI and whatever comes next.

At the moment I'm juggling four Docker containers, starting and stopping the extra services as needed. My custom scripts for doing that have become a bit of a code jungle, and I'm not happy with it.
I then discovered llama‑swap, which looked promising, but after digging a bit deeper I realized it would introduce yet another tool and add even more complexity, leaving me with three Dockers anyway.

A few minutes ago I ran into the news about the llama‑server router, which is fantastic! It eliminates that extra tool, so that’s a win. However, I would still have to run three Dockers to guarantee that both the main LLM and the coder are always available.

To get rid of most of the complexity in this scenario and collapse everything into a single Docker container, a fully flexible LLM/VRAM management would help.

TL;DR

So what about a priority / pinning mechanism with two options:

never unload this model
only unload a model if another model with the same or higher priority is requested

Implementing such a feature would be straightforward and elegant for home‑grown multi‑model stacks, because it preserves the conversational flow with the user. I'd rather prefer the system say:

"Just a moment - I'm spinning up FLUX to generate the picture you requested"

... than leave the user staring at dead silence, while models are swapped forward and backwards. =D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Priority & Model Pinning for Multi-Model / Router Mode #17979

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Model Priority & Model Pinning for Multi-Model / Router Mode #17979

Uh oh!

bluemoehre Dec 12, 2025

Replies: 0 comments

bluemoehre
Dec 12, 2025