Encoder updating in ICM implementations #10

xf-zhao · 2022-01-25T15:11:07Z

Hi, thank you all for this remarkable work. I found the codes are very well constructed.

I have one question about the implementations of ICM. I noticed that the encoder is only updated according to the loss of forward+inverse prediction model, and is not updated when critic networks udpate (since obs is detached when calling self.update_critic), though there is a parameter update_encoder=True that should control the behaviour (see url_benchmark/agent/icm.py, line 118-125, also as below).

        if not self.update_encoder:
            obs = obs.detach()
            next_obs = next_obs.detach()

        # update critic
        metrics.update(
            self.update_critic(obs.detach(), action, reward, discount,
                               next_obs.detach(), step))

I guess it is a choice after testing with it on and off? But if so then it will raise another question: the encoder is trained during pretraining procedure, but the one which randomly initialized ("random init" in the paper) used is not. So when comparing them, we cannot say that the representations learned using ICM is better than from random exploration.

Thank you in advance!

The text was updated successfully, but these errors were encountered:

seolhokim · 2022-03-26T12:07:42Z

I also have same question. DDPG updates encoder when training critic. but APT-ICM trained encoder when training only ICM. in my points of view, It looks not enough..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoder updating in ICM implementations #10

Encoder updating in ICM implementations #10

xf-zhao commented Jan 25, 2022

seolhokim commented Mar 26, 2022

Encoder updating in ICM implementations #10

Encoder updating in ICM implementations #10

Comments

xf-zhao commented Jan 25, 2022

seolhokim commented Mar 26, 2022