feat: Longcat-Image / Longcat-Image-Edit support #1053

stduhpf · 2025-12-05T20:04:19Z

sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

Test models (converted to bfl format) can be found there:

Inference for models in diffusers format seem to be still broken

wbruna · 2025-12-05T20:32:10Z

That does look a bit like a circuit board...

stduhpf · 2025-12-06T02:12:01Z

TODO for when image generation works

stduhpf · 2025-12-06T15:06:41Z

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

stduhpf · 2025-12-07T21:40:50Z

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and

I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

stduhpf · 2025-12-08T00:39:02Z

I think I got it?

With the padding fixed, but with diffusers format:

stduhpf · 2025-12-08T01:23:10Z

With the character-level tokenization trick:

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

stduhpf · 2025-12-08T01:34:26Z

Oh no, why are there so many conflicts now?

stduhpf · 2025-12-08T11:38:51Z

Using ' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect " instead

stduhpf · 2025-12-08T12:47:54Z

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model
sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

ref	out

(Also I made the change so it now needs double quotes around literal text)

stduhpf · 2025-12-08T13:34:28Z

Somehow couldn't get it to remove the original text, but there it goes

leejet · 2025-12-09T14:07:53Z

stable-diffusion.cpp

+                if (sd_version_is_flux(version) || sd_version_is_z_image(version) || sd_version_is_longcat(version)) {
+                    latent_rgb_proj = flux_latent_rgb_proj;
+                    latent_rgb_bias = flux_latent_rgb_bias;
+                    patch_sz        = 2;


For flux and z-image, should patch_sz be 1?

It should be, yes, but if the dim is 64, it means that the patch_size must have been set to 2.

leejet · 2025-12-09T14:09:21Z

vae.hpp

    struct ggml_tensor* decode(GGMLRunnerContext* ctx, struct ggml_tensor* z) {
        // z: [N, z_channels, h, w]
-        if (sd_version_is_flux2(version)) {
+        if (sd_version_is_flux2(version) || sd_version_is_longcat(version)) {


I think for longcat, it’s unnecessary to set the VAE patch_size to 2. With the Flux architecture, setting the patch_size to 2 there is already sufficient.

Yes I did try that before and it seems to work just as well (though the results are completely different with same seed because the noise pattern changes between the two implementations). I did it that way because that's how it is implemented in diffusers code (for base Flux.1 too). It also saves an almost negligible amount of compute since it's not patchifying and unpatchifying at every step.
Should I keep it consistent with Flux.1 instead?

From Longcat’s pipeline, I think it’s actually implemented the same way as Flux1: it generates noise first and then applies patchify, rather than generating patchified latents directly like Flux2.

https://github.com/meituan-longcat/LongCat-Image/blob/main/longcat_image/pipelines/pipeline_longcat_image.py#L359

Ah right. Again, in practice that shouldn't matter too much. But I can make sure it stays consistent with Flux.1.

Thanks, this will make the code more concise.

Rocky-Lee-001 · 2025-12-10T06:06:52Z

May I ask which comfyui node is used to load this GGUF model?

stduhpf added 7 commits December 2, 2025 23:28

add Flux.2 VAE proj matrix for previews

80784fb

Enable flux.2 proj for preview with flux model

beef322

support Flux.2 patched latents for proj preview

77e4620

move latent shuffle logic to latents-preview.h

da8e95e

refactor preview_latent_video to support flux.2 patchified latents

c054c23

Support LongCat Image model

e47c8c4

temp fix cuda error on quant concat for splitlinear

2c44904

stduhpf added 3 commits December 5, 2025 22:09

Merge branch 'flux2-proj' into longcat

6e9f5ec

pre-patchify

00071aa

longcat rope ids

7711efb

stduhpf added 3 commits December 6, 2025 03:47

Fix diffusers_style detection

535543a

Flux: simplify when patch_size is 1

61b0dcf

correct rope offset for image tokens

deaf939

stduhpf added 2 commits December 8, 2025 01:05

stuff

da0605f

Fix token length

bf82af9

Split quoted text into character-level tokens

a51f137

stduhpf marked this pull request as ready for review December 8, 2025 01:29

remove debug logs

148120b

support longcat-image-edit

7c8f0dc

Fix base rope offset for ref images

1b56d19

stduhpf changed the title ~~Wip: Longcat-Image support~~ Longcat-Image / Longcat-Image-Edit support Dec 8, 2025

stduhpf changed the title ~~Longcat-Image / Longcat-Image-Edit support~~ feat: Longcat-Image / Longcat-Image-Edit support Dec 8, 2025

leejet reviewed Dec 9, 2025

View reviewed changes

feat: Longcat-Image / Longcat-Image-Edit support #1053

Are you sure you want to change the base?

feat: Longcat-Image / Longcat-Image-Edit support #1053

Uh oh!

Conversation

stduhpf commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Dec 5, 2025

Uh oh!

stduhpf commented Dec 6, 2025

Uh oh!

stduhpf commented Dec 6, 2025

Uh oh!

stduhpf commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

leejet Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

leejet Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

leejet Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Rocky-Lee-001 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stduhpf commented Dec 5, 2025 •

edited

Loading

stduhpf commented Dec 7, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading