You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you all for this remarkable work. I found the codes are very well constructed.
I have one question about the implementations of ICM. I noticed that the encoder is only updated according to the loss of forward+inverse prediction model, and is not updated when critic networks udpate (since obs is detached when calling self.update_critic), though there is a parameter update_encoder=True that should control the behaviour (see url_benchmark/agent/icm.py, line 118-125, also as below).
I guess it is a choice after testing with it on and off? But if so then it will raise another question: the encoder is trained during pretraining procedure, but the one which randomly initialized ("random init" in the paper) used is not. So when comparing them, we cannot say that the representations learned using ICM is better than from random exploration.
Thank you in advance!
The text was updated successfully, but these errors were encountered:
I also have same question. DDPG updates encoder when training critic. but APT-ICM trained encoder when training only ICM. in my points of view, It looks not enough..
Hi, thank you all for this remarkable work. I found the codes are very well constructed.
I have one question about the implementations of ICM. I noticed that the encoder is only updated according to the loss of forward+inverse prediction model, and is not updated when critic networks udpate (since obs is detached when calling
self.update_critic
), though there is a parameterupdate_encoder=True
that should control the behaviour (see url_benchmark/agent/icm.py, line 118-125, also as below).I guess it is a choice after testing with it on and off? But if so then it will raise another question: the encoder is trained during pretraining procedure, but the one which randomly initialized ("random init" in the paper) used is not. So when comparing them, we cannot say that the representations learned using ICM is better than from random exploration.
Thank you in advance!
The text was updated successfully, but these errors were encountered: