You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear all,
The standard implementation of stochastic actors with Gaussian distributions is to generate actions with mean and standard deviations (Sigma). In the branch of Sigma, the "SoftPlus" activation in the output layer is commonly used to ensure Sigma is nonnegative. In the recent version of Tianshou 0.5.0, the class of tianshou.utils.net.continuous.ActorProb, which creates standard stochastic actor networks for users, calculates Sigma as an independent parameter using sigma = (self.sigma_param.view(shape) + torch.zeros_like(mu)).exp()
in default or generates it from the states using sigma = torch.clamp(self.sigma(logits), min=SIGMA_MIN, max=SIGMA_MAX).exp(), where SIGMA_MIN=-20 and SIGMA_MAX=2.
Therefore, we consider the following modifications:
Include additional parameter as :param bool sigma_softplus: True when sigma is calculated from the input using softplus layer, False when sigma is depending to the conditioned_sigma. Default to True.
as sigma_softplus: bool = True
Add the softplus layer as a last layer in the sigma branch in the __init__ function, if the sigma is conditioned as
self._sigma_softplus = sigma_softplus
if sigma_softplus:
self._c_sigma = True
else:
self._c_sigma =conditioned_sigma
if conditioned_sigma:
self.sigma = MLP(
input_dim, # type: ignore
self.output_dim,
hidden_sizes,
device=self.device
)
if sigma_softplus:
sigma_layers = [module for module in self.sigma.modules() if not isinstance(module, nn.Sequential)
and not isinstance(module, MLP)]
sigma_layers += [nn.Softplus()]
self.sigma = nn.Sequential(*sigma_layers)
Add the following condition in the forward function
if self._c_sigma :
if self._sigma_softplus:
sigma = self.sigma(logits)
else:
sigma = torch.clamp(self.sigma(logits), min=SIGMA_MIN, max=SIGMA_MAX).exp()
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear all,
The standard implementation of stochastic actors with Gaussian distributions is to generate actions with mean and standard deviations (Sigma). In the branch of Sigma, the "SoftPlus" activation in the output layer is commonly used to ensure Sigma is nonnegative. In the recent version of Tianshou 0.5.0, the class of tianshou.utils.net.continuous.ActorProb, which creates standard stochastic actor networks for users, calculates Sigma as an independent parameter using
sigma = (self.sigma_param.view(shape) + torch.zeros_like(mu)).exp()
in default or generates it from the states using
sigma = torch.clamp(self.sigma(logits), min=SIGMA_MIN, max=SIGMA_MAX).exp()
, whereSIGMA_MIN=-20
andSIGMA_MAX=2
.Therefore, we consider the following modifications:
Include additional parameter as
:param bool sigma_softplus: True when sigma is calculated from the input using softplus layer, False when sigma is depending to the conditioned_sigma. Default to True.
as
sigma_softplus: bool = True
Add the
softplus
layer as a last layer in the sigma branch in the__init__
function, if the sigma is conditioned asforward
functionBeta Was this translation helpful? Give feedback.
All reactions