CS229 final project

This repository contains a pytorch implementation for the Stanford CS229 course project Soundiffusion. We adopt two SOTA audio-to-image models.

Usage

First git clone the Sound2Scene repo, and download its pretrained audio encoder.

cd <reponame>
git clone https://github.com/postech-ami/Sound2Scene.git

To train you can choose the component you want to train, here we set unet, embedder

sh train.sh

To inference, you need to load the pretrained audio encoder, embedder and unet checkpoints

sh inf.sh

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
constants		constants
data		data
modules		modules
LICENSE		LICENSE
README.md		README.md
inf.sh		inf.sh
inference.py		inference.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb
train.sh		train.sh
train1.py		train1.py