We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey there,
Would it be wise to include entropy factor in your ppo implementation?
How to do that?
Second question I have is why do you not use 0.5*MSE Loss instead of F.smooth_l1_loss.
Here are some snippets as suggestion - but I am not absolutely sure
surr1 = ratio * advantage surr2 = torch.clamp(ratio, 1-eps_clip, 1+eps_clip) * advantage actor_loss = -torch.min(surr1, surr2) critic_loss = F.smooth_l1_loss(self.v(s) , td_target.detach())# alternative: 0.5*self.MseLoss(state_values, torch.tensor(rewards)) #beta = 0.01 # encourage to explore different policies let at 0.01 total_loss = critic_loss+actor_loss#- beta*dist_entropy
Including entropy we need a function like this:
def evaluate(self, state, action): #what values are returned here? action_probs = self.action_layer(state) dist = Categorical(action_probs) action_logprobs = dist.log_prob(action) dist_entropy = dist.entropy() state_value = self.value_layer(state) return action_logprobs, torch.squeeze(state_value), dist_entropy
However I am not sure about the best way to include entropy in your implementation.
Glad for some help
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hey there,
Would it be wise to include entropy factor in your ppo implementation?
How to do that?
Second question I have is why do you not use 0.5*MSE Loss instead of F.smooth_l1_loss.
Here are some snippets as suggestion - but I am not absolutely sure
Including entropy we need a function like this:
However I am not sure about the best way to include entropy in your implementation.
Glad for some help
The text was updated successfully, but these errors were encountered: