Adding SoftPlus layer for Sigma outputs from ActorProb module instead of clamp function #874

ARSayed · 2023-05-20T15:12:52Z

ARSayed
May 20, 2023

Dear all,
The standard implementation of stochastic actors with Gaussian distributions is to generate actions with mean and standard deviations (Sigma). In the branch of Sigma, the "SoftPlus" activation in the output layer is commonly used to ensure Sigma is nonnegative. In the recent version of Tianshou 0.5.0, the class of tianshou.utils.net.continuous.ActorProb, which creates standard stochastic actor networks for users, calculates Sigma as an independent parameter using
sigma = (self.sigma_param.view(shape) + torch.zeros_like(mu)).exp()
in default or generates it from the states using
sigma = torch.clamp(self.sigma(logits), min=SIGMA_MIN, max=SIGMA_MAX).exp(), where SIGMA_MIN=-20 and SIGMA_MAX=2.

Therefore, we consider the following modifications:

Include additional parameter as
:param bool sigma_softplus: True when sigma is calculated from the input using softplus layer, False when sigma is depending to the conditioned_sigma. Default to True.
as sigma_softplus: bool = True
Add the softplus layer as a last layer in the sigma branch in the __init__ function, if the sigma is conditioned as

self._sigma_softplus = sigma_softplus
        if sigma_softplus:
            self._c_sigma = True
        else:
            self._c_sigma =conditioned_sigma
if conditioned_sigma:
    self.sigma = MLP(
        input_dim,  # type: ignore
        self.output_dim,
        hidden_sizes,
        device=self.device
    )
    if sigma_softplus:
        sigma_layers = [module for module in self.sigma.modules() if not isinstance(module, nn.Sequential)  
                                                                 and not isinstance(module, MLP)]
        sigma_layers += [nn.Softplus()]
        self.sigma = nn.Sequential(*sigma_layers)

Add the following condition in the forward function

if self._c_sigma :
    if self._sigma_softplus:
        sigma = self.sigma(logits)
    else:
        sigma = torch.clamp(self.sigma(logits), min=SIGMA_MIN, max=SIGMA_MAX).exp()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding SoftPlus layer for Sigma outputs from ActorProb module instead of clamp function #874

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Adding SoftPlus layer for Sigma outputs from ActorProb module instead of clamp function #874

ARSayed May 20, 2023

Replies: 0 comments

ARSayed
May 20, 2023