Replies: 2 comments
-
Things I like to do. I am not a great fan of text based image synthesis, but prefer to work mainly on images. In addition, I suspect that a minimally trained diffusion may not perform very well with CLIP, in terms of visual results. Using strong visual guidance helps. Start from noise towards a target image (balancing between model and target image), possibly with a slight adjustment from a text prompt. Start from a initial image towards the model features, possibly with a slight adjustment from a text prompt. Start from initial image and strengthen its effect by using it also as the target image. Start from initial image towards a slightly different target image. Add text guidance to any of the above. If that leads to undesired detail, try using --tdecay to decrease text weight gradually, just a little at every step. Try to compensate by using higher text weight in the beginning. When using init image, look for a correct mix of image and noise in the beginning. Using just --image will let noise override it altogether. One option is to set balance between image and noise by reducing noise with. --mul (actually a divisor for noise). Something like 1.05 to 1.1 will reduce noise just a little, while 2 to 4 will reduce it significantly. You can additionally weaken yje image (make it lighter) by using --weak (0 = no image, 1 = full image). There is also the option --skip to skip steps similar to other implementations. |
Beta Was this translation helpful? Give feedback.
-
You also need a model. Either a pretrained one that someone else trained or then you can train your own. Training a minidiffusion model does not require much resources. My models have been trained on a 3090 running at 150 to 250 W for a day or two, using datasets from a hundred images to a few thousand. Such minimal models work best when the training material is visually homogenous enough. Selecting or making the images is a way to control the process visually and aesthetically. You can also resume training on an existing model, even with a different image set. You may find, like I did, that the most interesting results happen during the transition phase, when the model starts picking up new visual features but has not yet forgotten the old ones. This means, that the model mainly provides the style, becomes your image making tool, and that is what counts when making the training set. A minimally trained model works exactly because it does not have to learn about making flowers, animals, landscapes, faces. Of course it should be possible to train a minidiffusion model also with extensive image materials, if one has the required resources. But that is not what I am interested in. And there are enough general purpose models and colabs for that already. |
Beta Was this translation helpful? Give feedback.
-
miniDiffusion is a lightweight tool for working with images in artistic work, based on denoising diffusion.
It consists of two tools. Diffudiver is used to actually work on images. Diffutrainer is used to train diffusion models with one's image material of choice.
Let's first look at the diffudiver. This diagram gives an overview of diffudiver (v2).
First, there is the pretrained model, whose pretrained visual features will affect, or even dominate, the visual output.
The diffusion can be initialised
Then we can have targets, which guide image evolution through a loss function. We measure how far the image is currently from the target, and adjust it towards the targets.
Note that while the effects of pretrained model, initial image and target image are visual, the effects of text prompt and image prompt are mainly semantic.
Finally, we have learning rate, which affects how much the combined effect of all targets will affect the image at each step.
We don't usually want to use all these possibilities at the same time, but rather see them as tools which can be used in multiple ways to enable different techniques.
Beta Was this translation helpful? Give feedback.
All reactions