-
Notifications
You must be signed in to change notification settings - Fork 467
feat: Longcat-Image / Longcat-Image-Edit support #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
That does look a bit like a circuit board... |
|
I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it. |
|
Oh no, why are there so many conflicts now? |
|
Using |
| if (sd_version_is_flux(version) || sd_version_is_z_image(version) || sd_version_is_longcat(version)) { | ||
| latent_rgb_proj = flux_latent_rgb_proj; | ||
| latent_rgb_bias = flux_latent_rgb_bias; | ||
| patch_sz = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For flux and z-image, should patch_sz be 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be, yes, but if the dim is 64, it means that the patch_size must have been set to 2.
| struct ggml_tensor* decode(GGMLRunnerContext* ctx, struct ggml_tensor* z) { | ||
| // z: [N, z_channels, h, w] | ||
| if (sd_version_is_flux2(version)) { | ||
| if (sd_version_is_flux2(version) || sd_version_is_longcat(version)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for longcat, it’s unnecessary to set the VAE patch_size to 2. With the Flux architecture, setting the patch_size to 2 there is already sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did try that before and it seems to work just as well (though the results are completely different with same seed because the noise pattern changes between the two implementations). I did it that way because that's how it is implemented in diffusers code (for base Flux.1 too). It also saves an almost negligible amount of compute since it's not patchifying and unpatchifying at every step.
Should I keep it consistent with Flux.1 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Longcat’s pipeline, I think it’s actually implemented the same way as Flux1: it generates noise first and then applies patchify, rather than generating patchified latents directly like Flux2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Again, in practice that shouldn't matter too much. But I can make sure it stays consistent with Flux.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this will make the code more concise.
|
May I ask which comfyui node is used to load this GGUF model? |








for #1052
sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024Test models (converted to bfl format) can be found there:
Inference for models in diffusers format seem to be still broken