Skip to content

Conversation

@stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Dec 5, 2025

for #1052

sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

output

Test models (converted to bfl format) can be found there:

Inference for models in diffusers format seem to be still broken

@wbruna
Copy link
Contributor

wbruna commented Dec 5, 2025

That does look a bit like a circuit board...

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 6, 2025

TODO for when image generation works
image

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 6, 2025

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 7, 2025

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and
output
I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

I think I got it?
output

With the padding fixed, but with diffusers format:
output

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

With the character-level tokenization trick:
output

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

@stduhpf stduhpf marked this pull request as ready for review December 8, 2025 01:29
@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

Oh no, why are there so many conflicts now?

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

Using ' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect " instead

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model
sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

ref out
flux1-dev-q8_0 output

(Also I made the change so it now needs double quotes around literal text)

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2025

output

Somehow couldn't get it to remove the original text, but there it goes

@stduhpf stduhpf changed the title Wip: Longcat-Image support Longcat-Image / Longcat-Image-Edit support Dec 8, 2025
@stduhpf stduhpf changed the title Longcat-Image / Longcat-Image-Edit support feat: Longcat-Image / Longcat-Image-Edit support Dec 8, 2025
if (sd_version_is_flux(version) || sd_version_is_z_image(version) || sd_version_is_longcat(version)) {
latent_rgb_proj = flux_latent_rgb_proj;
latent_rgb_bias = flux_latent_rgb_bias;
patch_sz = 2;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flux and z-image, should patch_sz be 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be, yes, but if the dim is 64, it means that the patch_size must have been set to 2.

struct ggml_tensor* decode(GGMLRunnerContext* ctx, struct ggml_tensor* z) {
// z: [N, z_channels, h, w]
if (sd_version_is_flux2(version)) {
if (sd_version_is_flux2(version) || sd_version_is_longcat(version)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for longcat, it’s unnecessary to set the VAE patch_size to 2. With the Flux architecture, setting the patch_size to 2 there is already sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did try that before and it seems to work just as well (though the results are completely different with same seed because the noise pattern changes between the two implementations). I did it that way because that's how it is implemented in diffusers code (for base Flux.1 too). It also saves an almost negligible amount of compute since it's not patchifying and unpatchifying at every step.
Should I keep it consistent with Flux.1 instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Longcat’s pipeline, I think it’s actually implemented the same way as Flux1: it generates noise first and then applies patchify, rather than generating patchified latents directly like Flux2.

https://github.com/meituan-longcat/LongCat-Image/blob/main/longcat_image/pipelines/pipeline_longcat_image.py#L359

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Again, in practice that shouldn't matter too much. But I can make sure it stays consistent with Flux.1.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this will make the code more concise.

@Rocky-Lee-001
Copy link

May I ask which comfyui node is used to load this GGUF model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants