This repository contains a pytorch implementation for the Stanford CS229 course project Soundiffusion. We adopt two SOTA audio-to-image models.
First git clone the Sound2Scene repo, and download its pretrained audio encoder.
cd <reponame>
git clone https://github.com/postech-ami/Sound2Scene.git
To train you can choose the component you want to train, here we set unet, embedder
sh train.sh
To inference, you need to load the pretrained audio encoder, embedder and unet checkpoints
sh inf.sh