Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

What if the observation is extracted features instead of images and has much smaller dimension than latent? #59

Open
seheevic opened this issue Feb 6, 2020 · 1 comment

Comments

@seheevic
Copy link

seheevic commented Feb 6, 2020

Hi!
I'm not sure you still do Q&A support here 😊, but I'm obsessed a certain problem beyond my math skills. I hope you could help me.

The question is related to the loss function of your RSSM which uses variational approach. The reconstruction loss of VAE is p(o_t|s_t) as it is decoder from latent to image. In this case, an observation(=image) has much bigger dimension than the latent. But when it comes to the case in which o_t has much smaller dimension (for example, 4 values like cartpole of OpenAI gym classic_control) than the latent(let's say this is 32~64 here), I think p(o_t|s_t) could not learn any meaningful distribution. Because the conditional s_t was sampled from variational posterior q(s_t|a_1:t, o_1:t) which already has seen the observation of current timestep o_t, I suspect that s_t could just learn to copy the full o_t inside s_t because the dimension of s_t is much bigger.

In this situation (non-image and small dimension of observation), can we still hold this VAE-like approach?
Or is there some other technique more reasonable in this case?
I hope this worry makes sense to you. 😕

@seheevic seheevic changed the title What if the observation is extracted features instead of images? What if the observation is extracted features(smaller dimension than latent) instead of images? Feb 6, 2020
@seheevic seheevic changed the title What if the observation is extracted features(smaller dimension than latent) instead of images? What if the observation is extracted features instead of images and has much smaller dimension than latent? Feb 6, 2020
@abrandenb
Copy link

abrandenb commented Feb 10, 2020

Since the Autoencoder is used for dimensionality reduction (in the default configs from 64x64x3=12288 dimensions down to around 500 dimensions), I would not apply it in the scenario you describe. If you have a low-dimensional input, you may skip the autoencoder, since it wouldn't give you any gain. I assume you can still learn the latent dynamics model, the reward model and then apply MPC, just like planet would do if you scrap the VAE.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants