How does MonotonicMLP work? #44

CaioDaumann · 2024-03-11T12:25:19Z

CaioDaumann
Mar 11, 2024

Hi,

I been looking for kind of network that is monotonically increasing (except for normalising flows) and the zuko.nn.MonotonicMLP may be what I am looking for. But I wonder in which used to base this? And could you give me a quick explanation of how this works?

Best,
Caio

Answered by francois-rozet

Mar 11, 2024

Hello @CaioDaumann,

zuko.nn.MonotonicMLP is indeed a monotonic MLP, that is a (parametric) function $y = f(x)$ for which increasing any input feature $x_i$ leads to an increase in the output features $y_j$.

Now how does it work? One way to construct a monotonic function is by adding (not substracting) several monotonic functions together. So, to construct a monotonic MLP, you simply have to impose the weights of the linear layers to be positive, and the activation functions to be monotonic.

In Zuko's implementation, the weights are made positive by taking their absolute value (an alternative would be to use softplus or exp), and the activation function is $ELU(x)$ for half the hidden feat…

View full answer

francois-rozet · 2024-03-11T16:01:11Z

francois-rozet
Mar 11, 2024
Maintainer

Hello @CaioDaumann,

zuko.nn.MonotonicMLP is indeed a monotonic MLP, that is a (parametric) function $y = f(x)$ for which increasing any input feature $x_i$ leads to an increase in the output features $y_j$.

Now how does it work? One way to construct a monotonic function is by adding (not substracting) several monotonic functions together. So, to construct a monotonic MLP, you simply have to impose the weights of the linear layers to be positive, and the activation functions to be monotonic.

In Zuko's implementation, the weights are made positive by taking their absolute value (an alternative would be to use softplus or exp), and the activation function is $ELU(x)$ for half the hidden features and $-ELU(-x)$ for the other half, which are both monotonic.

2 replies

CaioDaumann Mar 13, 2024
Author

Hi @francois-rozet thanks for the answer!

Does this hurts the convergence of the model somehow? I imagine that by constraining your weights to be positive, will make the network less expressive, no? It still a universal approximation after that?

Also, you based yourself in some paper for this implementation? If so, could you link the paper?

Thanks for the help!

francois-rozet Mar 13, 2024
Maintainer

Good question! A monotonic MLP is a universal monotonic function approximator. However, there are some conditions on the activation functions as you cannot substract their outputs. For instance, by adding ReLU outputs, it is not possible to construct sigmoid-like functions, or even $f(x) = \min(x, 0)$. However, by using $ReLU(x)$ and $-ReLU(-x)$ activations, it is possible to make any monotonic function.

In Zuko's implementation, we use ELU instead of ReLU because we want our monotonic function to be strictly increasing, but it is not from a paper.

Not sure about the convergence though, especially due to the absolute value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does MonotonicMLP work? #44

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How does MonotonicMLP work? #44

CaioDaumann Mar 11, 2024

Replies: 1 comment · 2 replies

francois-rozet Mar 11, 2024 Maintainer

CaioDaumann Mar 13, 2024 Author

francois-rozet Mar 13, 2024 Maintainer

CaioDaumann
Mar 11, 2024

Replies: 1 comment 2 replies

francois-rozet
Mar 11, 2024
Maintainer

CaioDaumann Mar 13, 2024
Author

francois-rozet Mar 13, 2024
Maintainer