Audiffuse

About

I set out to create a solution to a very subjective problem: Generating Album art Based only off of the audio of a Song. Obviously this is very hard to do, and while the model created is by no means perfect, it is interesting to see what it has learned to generate conditionally off music.

Model Architecture

The model is a Latent Diffusion model with the conditional head replaced with the MERT-v1-95M Music2Vec model.

In the above diagram this means passing the MERT-v1-95M vector output from the song it was fed, into $\tau_{\theta}$.

Generations

Here is a peice of album art generated by the model, the top left is the original album art from the song and the rest are novel album arts generated by the model.

While obviously music is very subjective, there are some hints here tha tthe model has learned to match the general vibe of a song, from a qualitative perspective all images tend to be very calm ones, and the model has managed to match the color pallete pretty accurately as well, save the bottom right image.

Here is another example of the model picking up on cues in the song, as both the original and some of the generated images have a very urban feel to them. With the top right literally being a brick wall, and the bottom left one appearing to contain at least the pattern of an automobile. Though these dont appear to have the same fidelity as the images generated from the bluesouth song they do illustrate the models ability to derive some meaning from the music.

Interesting Notes

We fine tuned on the UNet of Stable Diffusion v1.4 along with using the corrisponding autoencoder. Despite the conditional head being completely different, this suprisingly helped us produce less abstract results than training the UNet from scratch.

TODO

Release Weights
Release How To

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CLAP @ 6e3d069		CLAP @ 6e3d069
assets		assets
configs		configs
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
audiffuse.py		audiffuse.py
autoencoder_TEST.ipynb		autoencoder_TEST.ipynb
clap.ipynb		clap.ipynb
dataset.ipynb		dataset.ipynb
diffuser.ipynb		diffuser.ipynb
download_es_dataset.pbs		download_es_dataset.pbs
es_dataset.py		es_dataset.py
full_test.ipynb		full_test.ipynb
gen_results.ipynb		gen_results.ipynb
load_cuda.sh		load_cuda.sh
main.py		main.py
preprocess_audio.py		preprocess_audio.py
rockin_around.mp3		rockin_around.mp3
rockin_around.wav		rockin_around.wav
test_song.mp3		test_song.mp3
test_song.npy		test_song.npy
testaud.npy		testaud.npy
train_audiffuse.pbs		train_audiffuse.pbs
train_audiffuse.sh		train_audiffuse.sh
wav2vec2.ipynb		wav2vec2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audiffuse

About

Model Architecture

Generations

Interesting Notes

TODO

About

Releases

Packages

Languages

License

Stelath/audiffuse

Folders and files

Latest commit

History

Repository files navigation

Audiffuse

About

Model Architecture

Generations

Interesting Notes

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages