[GIVT] Inverse relation between sampling FID and training loss #125

adarshm-nd · 2024-07-17T22:26:42Z

adarshm-nd
Jul 17, 2024

In the GIVT paper, in Fig. 5, Transformers with higher $\beta$ have lower sampling FID, but their training error seem to be higher in Fig. 9. It seems counter-intuitive, why training loss and sampling FID appear to have this inverse relation in Fig 5 and 9? Thanks in advance .

mitscha · 2024-07-19T15:24:36Z

mitscha
Jul 19, 2024

I have the following hand-wavy intuition: A larger $\beta$ in the VAE training regularizes the VAE latents more towards a standard normal. As a result, the latents store less information and that the VAE decoder (with GAN loss) takes over more of the "modeling work" in modeling the data distribution (which results in a higher reconstruction FID). Given the latents store less information and resemble more a standard normal, they are less predictable, hence the larger NLL. At the same time, the signal part in the latent becomes easier to model so that the overall system performs better.

1 reply

adarshm-nd Jul 22, 2024
Author

Makes a lot of sense. Thanks a lot for the reply !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GIVT] Inverse relation between sampling FID and training loss #125

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[GIVT] Inverse relation between sampling FID and training loss #125

adarshm-nd Jul 17, 2024

Replies: 1 comment · 1 reply

mitscha Jul 19, 2024

adarshm-nd Jul 22, 2024 Author

adarshm-nd
Jul 17, 2024

Replies: 1 comment 1 reply

mitscha
Jul 19, 2024

adarshm-nd Jul 22, 2024
Author