Uncond-Zero-for-ComfyUI

Allows to sample without generating any negative prediction with Stable Diffusion!

I did this as a personnal challenge: How good can a generation be without a negative prediction while following these rules:

no LCM/Turbo/Lightning or any similar method to develop the tool ✔
Nothing making the sampling noticeably slower than if using euler with a CFG scale at 1. ✔
Should work with "confusing prompts" which tends to make a mess like "macro shot of a glowing forest spirit,leafy appendages outlined with veins of light,eyes a deep,enigmatic glow amidst the foliage.," ✔
Should allow to use a negative prompt despite not generating a negative prediction (Shout out to Clybius who helped me getting started with the maths!) ✔
Should work with max 12 steps ✔

The goal being to enhance the sampling and take even more advantages of other acceleration methods like the tensor RT engines.

With an RTX4070:

SDXL 1024x1024 / tensor rt: 9.67it/s

LCM SD 1.5 512x512 / tensor rt: 37.50it/s

⚠ Examples will be at the bottom ⚠

Nodes

Uncond Zero

To connect like a normal model patch. Generally right after the model loader.

Scale: basically similar to the CFG scale. I implemented a logic inspired from my other node AutomaticCFG with a few modifications so to adapt it to not using any negative.
"pre_fix": Uses the previous step to modify the current one. This is the main trick to get a better quality / sharpness.
"pre_scale": How strong will the effect be.
- Recommanded: 1 for sde/ancestral samplers, 1.5 if you want to use something like dpmpp2m.

IF THE CFG SCALE IS AT 1 OR IF THERE IS NO NEGATIVE (using the ConditioningSetTimestepRange node):

does what is described above

ELSE:

Acts like the Automatic CFG

Conditioning combine positive and negative

Affects the positive conditioning with the negative.

It threats equally the negative conditioning in case you would want to use it during normal sampling but its main purpose it only for the positive.

Caveat: The combination will go as far as the shortest conditioning. Meaning the is your negative is 3 x 77 tokens and your positive only 2 * 77, only 2 / 3 of your negative will be taken into account.

Conditioning crop or fill

This node allows to use longer or shorter prompts with Tensor RT engines.

When creating a tensor rt engine, you can set the context length.

Here, "context_opt" set at 4:

This is how long your context will be. Meaning, how many times 77 tokens you can use.

The issue is that if you set it at 1, any prompt being longer will make it spam your CLI and ignore the extra.

If you set it at more than one during the creation and use a shorter conditioning it will generate noise while spamming the CLI.

So what this node does is simply allow you to set the desired context length. If your conditioning is longer it will crop it. If it is shorter it will concatenate an empty one until the length is reached.

interrupt on NaN

While I do not have seen any since the latest updates, tensor rt would sometimes throw a random black image. What this node does is that it cancels the sampling if any invalid value is detected. Also useful if you want to test Uncond Zero with bogus scales. The toggle will replace these values by 0 instead of cancelling.

Examples

(all images are workflows)

Nothing versus everything (SDXL/tensorrt), same generation speed:

07851UI_00001_ 07847UI_00001_

SD 1.5 (merge) with LCM in 3 steps.

Vanilla / Only with the prediction scaled / "pre_fix" Enabled added / Negative prompt added:

07918UI_00001_ 07917UI_00001_ 07916UI_00001_ 07913UI_00001_ - Copie

Negative prompt integration example:

Just "bad quality" (everything after will also have "bad quality" at the end):

negative bad quality

Summer in the negative:

negative summer

Winter:

negative winter

Water:

water

Water, autumn:

negative water, autumn

pre_fix

off / 0.5 / 1

combined_image

"skill issue"

You too! Discover how this man went from a bland face

07922UI_00001_

To a smiling average dude:

07834UI_00001_

To this very successful businessman with five fingers!

07841UI_00001_

All is the same seed. First image is "a man with a sad face" without any modification.

The second is with all the modification enabled but the prompt is only "a smiling man".

The third one is "a smiling man wearing a suit, hiding behind a tree, hdr quality".

Or in short: a better prompt will actually give you a better result. While it may seem obvious, in general while using a negative prediction it makes it good even when the prompt is simple. While without it, it does not. If anything that is for me the biggest (if big) caveat as I am not allowed to be as lazy as I like and forces me to add at least like two or three words in my prompts to make them better sometimes 😪.

Tips:

You can use my temperature node to change the CLIP temperature to lower/higher, it will greatly change the output!
I wouldn't be against SOME support! :)

Pro tip:

Did you know that my first activity is to write creative model merging functions?

While the code is too much of a mess to be shared, I do expose and share my models. You can find them in this gallery! 😁

Image Upscaler

wan2.1 14B text to video (With Lora)

Hailuo Video 01 Image To Video