Replies: 3 comments
-
|
I've encountered this situation too. Adding a template sometimes causes it to ramble incoherently and even display strange symbols.Especially when discussing restricted topics |
Beta Was this translation helpful? Give feedback.
-
|
not sure if this is your exact issue but I found on newer versions of llama.cpp i had to add "--chat-template chatml" to a mistral based model that had previously just worked. without this it would ignore the prompt and just talk about random weird stuff. |
Beta Was this translation helpful? Give feedback.
-
|
Hi there. i have a similar issue when adding --chat-template gemma the server answers "unable to tokenize prompt" and crashes. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As the title says, every chat template seems to be broken in llama-server?.
It VERY possible that the issue is between the chair en the keyboard (me)
but I do not know what to do from here? Its not some super resent change since I tried building slightly older versions.
I asked the SOTA models (gemini) ofcourse but I couldn't get it fixed?
I tried different sizes of Qwen, from bartowkki and unsloth and some random models from bartowksi.
but after 2 or 3 messages the models start responding to themself, or mixing up who said what?
I'm adding " --jinja" but also tried without, I even tried adding "--reasoning-format deepseek" for qwen, and it is a bit different but still responds on its own messages?
./llama-server.exe --model Qwen3-30B-A3B-Thinking-2507-UD-Q8_K_XL.gguf --jinja -ngl 99 --threads -1 --ctx-size 131072 --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --presence-penalty 1.0 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on -ot ".ffn_(up|down)_exps.=CPU"
Beta Was this translation helpful? Give feedback.
All reactions