Seraena is 🚧 WIP 🚧 PyTorch code for stably training mode-dropping deterministic latent autoencoders like TAESD using only conditional adversarial loss (without LPIPS/L1 or pretraining).
This repo includes an example TAESDXL training notebook which trains a lightweight single-step decoder for the SDXL VAE using Seraena. It also trains a simple (MSE-distilled) encoder for completeness.
If you find any other interesting uses for the Seraena code / models, LMK and I can link them here.
It's basically the usual PatchGAN discriminator + rescaled gradient setup (just with a replay buffer on generated samples). See the code.
Although Seraena is quite simple, there are still several YOLO'd hyperparameters and design choices present in the Seraena code (learning rates, batch and replay buffer size, discriminator architecture). I haven't done any serious benchmarking, ablations, or tuning of these choices. I also haven't verified if Seraena can match the full performance of released TAESD or SD-VAE.
If you want a serious, battle-tested autoencoder training repo I recommend looking at the Stability or MosaicML codebases.