Skip to content

Stelath/audiffuse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audiffuse

About

I set out to create a solution to a very subjective problem: Generating Album art Based only off of the audio of a Song. Obviously this is very hard to do, and while the model created is by no means perfect, it is interesting to see what it has learned to generate conditionally off music.

Model Architecture

The model is a Latent Diffusion model with the conditional head replaced with the MERT-v1-95M Music2Vec model.

In the above diagram this means passing the MERT-v1-95M vector output from the song it was fed, into $\tau_{\theta}$.

Generations

Here is a peice of album art generated by the model, the top left is the original album art from the song and the rest are novel album arts generated by the model.

While obviously music is very subjective, there are some hints here tha tthe model has learned to match the general vibe of a song, from a qualitative perspective all images tend to be very calm ones, and the model has managed to match the color pallete pretty accurately as well, save the bottom right image.

Here is another example of the model picking up on cues in the song, as both the original and some of the generated images have a very urban feel to them. With the top right literally being a brick wall, and the bottom left one appearing to contain at least the pattern of an automobile. Though these dont appear to have the same fidelity as the images generated from the bluesouth song they do illustrate the models ability to derive some meaning from the music.

Interesting Notes

We fine tuned on the UNet of Stable Diffusion v1.4 along with using the corrisponding autoencoder. Despite the conditional head being completely different, this suprisingly helped us produce less abstract results than training the UNet from scratch.

TODO

  • Release Weights
  • Release How To

About

A audio to image generator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages