Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I cannot believe this doesn't have more stars and attention !!! FREAKIN AWESOME #3

Open
PurpleBlueAloeVera opened this issue Oct 7, 2023 · 19 comments

Comments

@PurpleBlueAloeVera
Copy link

Thank you so much for this. Do you plan on adding SD2.1 to the features ?? We have some solid 2.1 models that we'd love to user as refiners for 1.5, or SDXL in our workflows.

Awesome, seriously thank you for sharing this awesome tool !

@city96
Copy link
Owner

city96 commented Oct 7, 2023

Glad you like it. It was just a quick one-off idea I had, and there's definitely improvements that could be made if I had the time.

As for using 2.1, Last time I checked 1.5 and 2.X latents were compatible, so technically "v1" in the dropdown is "v1/v2". That means it should work fine with 2.1 models without any changes.

@PurpleBlueAloeVera
Copy link
Author

PurpleBlueAloeVera commented Oct 7, 2023

Nah I had tried before. SD 2.X and 1.5 are incompatible in the latent space :/

EDIT: NVM, you're right. It actually works. I don't know if it's optimal AS is. But very surprising to see that the latent space is compatible with 1.x Damn.

@city96
Copy link
Owner

city96 commented Oct 8, 2023

Just double checked. Stable diffusion v2.1 and v2.0 both come with the same VAE, which is "ft-mse-840000" - the same one people usually use with SDv1.5. This means it is not only compatible, it's 100% the same latent format as far as I can tell. Sure, the model might be more sensitive to the noise the interposer adds, but an improved xl->v1 interposer would also mean improvements to xl->v2. And to reiterate, there's still plenty of room for improvement. I could probably even take the slightly less scuffed architecture from my latent upscaler and apply it here, though I'd like to design something better once I figure out more about how all this neural network stuff works :P

840K VAE commonly used with 1.5
SD2.1 VAE from the official repository
File is the same. SHA256: a1d993488569e928462932c8c38a0760b874d166399b14414135bd9c42df5815

@PurpleBlueAloeVera
Copy link
Author

@city96 Thanks for your response! If I may ask, how could this be improved by the way ? (The interposer), also, would there be a way to do this with a single .safetensors that could keep some kind of "merge" of a 1.5 model with an SDXL one ? Or would that be absolutely impossible.

Thanks in advance for your time

@city96
Copy link
Owner

city96 commented Oct 10, 2023

how could this be improved by the way

Well, the neural network part would have to be changed. Currently it's just a bunch of random conv2D layers that look like a spaceship. I think I have an idea on how to make a better one but yeah, time...

The other thing that needs changing is the dataset, but I think I got a decent one I can re-use from the upscaler. Which means the only other thing I'd need is, again, time to work on this :P

would there be a way to do this with a single .safetensors

You mean combining the v1->xl and xl->v1 models into a single file? I mean, that's easy enough to do I guess... You can store multiple models in the same safetensor file just fine.

@PurpleBlueAloeVera
Copy link
Author

How would you do that ? Store multi-models in one single safetensor file ? :o

@city96
Copy link
Owner

city96 commented Oct 10, 2023

Same way SD does it. safetensor files just store pairs of keys:values (in this case the values are the network weights). You can just add a prefix to all the keys so you can grab the ones you need while loading.

For example, all of the stable diffusion checkpoint files will have a bunch of keys starting with "first_stage_model" - that's the VAE. Similarly, CLIP and the actual UNET are also stored in the same file just with different prefixes.

I'd probably do something like this if I had to put both interposer models in the same file:

import torch
from safetensors.torch import load_file, save_file

v1_to_xl = load_file("v1-to-xl_interposer-v1.1.safetensors")
xl_to_v1 = load_file("xl-to-v1_interposer-v1.1.safetensors")

out_dict = {}
for k,v in v1_to_xl.items():
	out_dict[f"v1_to_xl.{k}"] = v
for k,v in xl_to_v1.items():
	out_dict[f"xl_to_v1.{k}"] = v

save_file(out_dict, "interposer-v1.1.safetensors")
List of keys before/after

xl->v1 keys:

dict_keys(['sequential.0.bias', 'sequential.0.weight', 'sequential.2.bias', 'sequential.2.weight', 'sequential.4.bias', 'sequential.4.weight', 'sequential.6.bias', 'sequential.6.weight'])

v1->xl keys:

dict_keys(['sequential.0.bias', 'sequential.0.weight', 'sequential.2.bias', 'sequential.2.weight', 'sequential.4.bias', 'sequential.4.weight', 'sequential.6.bias', 'sequential.6.weight'])

combined output keys:

dict_keys(['v1_to_xl.sequential.0.bias', 'v1_to_xl.sequential.0.weight', 'v1_to_xl.sequential.2.bias', 'v1_to_xl.sequential.2.weight', 'v1_to_xl.sequential.4.bias', 'v1_to_xl.sequential.4.weight', 'v1_to_xl.sequential.6.bias', 'v1_to_xl.sequential.6.weight', 'xl_to_v1.sequential.0.bias', 'xl_to_v1.sequential.0.weight', 'xl_to_v1.sequential.2.bias', 'xl_to_v1.sequential.2.weight', 'xl_to_v1.sequential.4.bias', 'xl_to_v1.sequential.4.weight', 'xl_to_v1.sequential.6.bias', 'xl_to_v1.sequential.6.weight'])

Then you just split off the ones you actually need while loading with a startswith or lambda or w/e.

@city96
Copy link
Owner

city96 commented Oct 11, 2023

@PurpleBlueAloeVera Figured I'd ping you, I re-trained the whole thing with a new architecture. It should work a lot better now for both xl->v1 and v1->xl. It still has some hue/saturation issues but overall it's an improvement.

I'd appreciate it if you could re-test using it with SDv2.x models as well, since that was one of the things you said worked sub-par.

INTERPOSER_V3

@PurpleBlueAloeVera
Copy link
Author

It looks indeed a LOT better here! Well done. And for sure, I'll try this a.s.a.p and get back to you. Btw, no problem, don't hesitate to ping me if you'd like me to test/feedback! I'm loving this thing you brought. :)

@TomLucidor
Copy link

Q: how could this allow people to port SD1.5 LoRAs into SDXL? or is it strictly a Checkpoints thing?

@city96
Copy link
Owner

city96 commented Nov 28, 2023

Q: how could this allow people to port SD1.5 LoRAs into SDXL? or is it strictly a Checkpoints thing?

I guess issue #1 kind of explains how you can do that. That's the only real way you can use v1 LoRAs with xl, but obviously it won't work for concept LoRAs, only character/style ones.

260507548-955d6a7f-eca0-4aaa-a1fe-a1229a13744f

@Benzene82
Copy link

I'm just getting past the beginner stage of ComfyUI/Stable Diffusion in general and this process is exactly what I'm looking for. I've tried many ways of installing this, files all seem to be in order in directories, I just can't find a workflow. Dropping the .png into ComfyUI doesn't work. I must be missing something very obvious. Any help from anyone to get this to work would be greatly appreciated.
Thanks!

@city96
Copy link
Owner

city96 commented Jan 3, 2024

@Benzene82 You mean the one in the image above? It's just a demo workflow but if you want it then here's the JSON metadata for it. Good thing I never delete anything lol. Feel free to reply if you got any questions.

@Benzene82
Copy link

Benzene82 commented Jan 3, 2024 via email

@city96
Copy link
Owner

city96 commented Jan 3, 2024

I'll test out other Models and LoRAs to learn more about how it works. I'm trying to get away from LoRAs that basically stamp what they are trained on into a subject, making copycat images.

Yeah, that can be annoying. I mostly use my own LoRAs now but a lot of the civitai ones are overtrained like that and need a really low weight to even work, if they work at all and aren't completely incompatible lol.

I guess you could look into controlnet or do what I tend to do, which is generate your base image with one model (in this case SDXL) and then img2img it (directly pass the latent via this node) on 1.5 at a high enough denoise. 0.5+ is recommended for this IMO. Some more esoteric stuff might work like canny edge/openpose detect the output from SDXL and using that for an input for 1.5.

I thought blending 1.5 LoRAs with SDXL ones might add some 'variety' and possibly more realism. Simply blending the latent images makes a weird hybrid and I hope this node and process delivers better results.

This node is just meant to replace the need to do a VAE dencode/encode between SDXL/SDv1, though people have used it for some more crazy stuff, like returning the leftover noise from XL and denoising it on v1 with the advanced KSampler. I guess you could convert the SDXL latent with this node and then pipe it into a latent composite/blend node together with the v1 one.

If you could send a link to the latest workflow, V3, I'd greatly appreciate it.

There's no "official" workflow for this repo. I don't really use SDXL anymore (switched to PixArt alpha for the initial image for my new stuff) but here's one of my old workflows for SDXL. It isn't very good but maybe it'll work as a starting point for you?

Link to workflow JSON

SDXLWF

@TomLucidor
Copy link

@city96 what about object LoRAs? How would you get around with SD1.5 to SDXL?

a lot of the civitai ones are overtrained like that and need a really low weight to even work, if they work at all and aren't completely incompatible

That is also a concern, if that is the case what would be the strategy of LoRA cleaning with human-in-the-loop? RLHF/PPO or some other alternative that reduces the amount of human judgement on quality?

@city96
Copy link
Owner

city96 commented Jan 3, 2024

what about object LoRAs? How would you get around with SD1.5 to SDXL?

Masking/inpainting I guess? Maybe using a similar enough placeholder object for XL?

LoRA cleaning

Not sure what you mean. How well a LoRA works will heavily depend on what model you use it with, so there's no universal "best" weight for a given LoRA. It could work perfectly with the model it was trained on while failing miserably if the model it's applied to is different enough (As an extreme example, run a regular 1.5 LoRA on DPO or TokenCompose and see how well that turns out lol).

You also won't immediately see a pattern if it's ovetrained, so it might take a bit to realize it's just spitting out variations of the training images. Detecting this would be pretty hard as you'd need some sort of similarity score over a large batch of samples.

If you mean LoRA dataset cleaning, that's out of scope for this repo.

@TomLucidor
Copy link

TomLucidor commented Jan 11, 2024

If you mean LoRA dataset cleaning

More like generating and filtering data from a LoRA and further refining them to be more "accurate" by human feedback (random image X is more accurate as "synthetic data" than random image Y). For "smelling" overtraining I am not quite sure if there are ways to make things better (rediscover an optimal weight, human feedback etc.)

@TomLucidor
Copy link

A bit of a side note but X-Adapter might be just as useful https://github.com/showlab/X-Adapter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants