You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Being a total newbie at machine learning, I'm wondering, what are the main differences between Puzer's approach and transparent_latent_gan?
Another issue - transparent_latent_gan is using the smaller CelebA dataset, so that might be the reason why sometimes its features get entangled too much and StyleGAN gets stuck when you try to lock and combine too many features (try to adjust the sliders to create an old, bald, non-smiling, bearded man with eyeglasses).
I'm wondering if Puzer's approach could work better? I tried current age direction and noticed that at some point it tries to add glasses and beard. I guess, those two features got entangled with age and I'm not sure what could be done to disentangle them - I hope to get only wrinkles and receding hairline for age direction.
Also, when encoding images, I found out that sometimes align works incorrectly cropping away top of a head. And for some of my images, the optimal encoder combination seems to be learning rate of 4.0 and image size of 512. With default settings (learning rate of 1 and image size 256) it got some tricky images (old black&white photos) or complex scenarios (large mustache over lips) totally corrupted, and for some less complex images it lost enough tiny details to make the photo feel too "uncanny" to consider to be exact match, especially, for younger people who don't have enough deep wrinkles or beards and also when images are shot with lots of light, so those tiny details and shadows matter a lot.
Of course, 4.0 @ 512 can take pretty long time to train, and sometimes 1000 iterations are not enough. With one specific tricky image I went as far as to 4000 iterations to get satisfactory results, while for some other images such high learning rate + iterations led to washed-out images (overfitting?).
@ramapinnimty You can refer to the GANSpace and InterfaceGAN. Both of the methods focus on disentangled directions. The first one uses PCA with unsupervised, the second one uses the idea of linear projection.
I, too, got curious and played with GANSpace a bit.
The conclusions are as follows.
While limiting components to layers helps a lot for disentangling some minor features (color, background(, still it is not good enough for major features (age, gender, beard, glasses) because they are too much entangled with each other.
I suspect the source of the problem is that GAN was trained on images of different people with different features, so it's difficult to isolate a feature, let's say "beard".
I think it would work much better if GAN was trained on images of the same person with various degrees of features that are interesting to us. For example, on a series of photos of the same person with different degrees of beard, age, etc. Of course, it's impossible to collect such a dataset from real people, but it might be possible to use high-definition 3D model renders. Still, it would be a very time-consuming endeavor to create enough models.
I couldn't try InterfaceGAN, though, because my GPU doesn't have enough RAM (4GB leads to infamous "RuntimeError: CUDA out of memory.").
I just found a project that allows controlling a bunch of StyleGAN features through UI knobs:
https://github.com/SummitKwan/transparent_latent_gan
Being a total newbie at machine learning, I'm wondering, what are the main differences between Puzer's approach and transparent_latent_gan?
Another issue - transparent_latent_gan is using the smaller CelebA dataset, so that might be the reason why sometimes its features get entangled too much and StyleGAN gets stuck when you try to lock and combine too many features (try to adjust the sliders to create an old, bald, non-smiling, bearded man with eyeglasses).
I'm wondering if Puzer's approach could work better? I tried current age direction and noticed that at some point it tries to add glasses and beard. I guess, those two features got entangled with age and I'm not sure what could be done to disentangle them - I hope to get only wrinkles and receding hairline for age direction.
Also, when encoding images, I found out that sometimes align works incorrectly cropping away top of a head. And for some of my images, the optimal encoder combination seems to be learning rate of 4.0 and image size of 512. With default settings (learning rate of 1 and image size 256) it got some tricky images (old black&white photos) or complex scenarios (large mustache over lips) totally corrupted, and for some less complex images it lost enough tiny details to make the photo feel too "uncanny" to consider to be exact match, especially, for younger people who don't have enough deep wrinkles or beards and also when images are shot with lots of light, so those tiny details and shadows matter a lot.
Of course, 4.0 @ 512 can take pretty long time to train, and sometimes 1000 iterations are not enough. With one specific tricky image I went as far as to 4000 iterations to get satisfactory results, while for some other images such high learning rate + iterations led to washed-out images (overfitting?).
Originally posted by @progmars in #5 (comment)
The text was updated successfully, but these errors were encountered: