Issue with Adam implementation #82

ana-mariacretu · 2018-07-05T09:51:51Z

Thanks for releasing the code!
I noticed that in skip-thoughts/training/optim.py, the roles for beta1 and beta2 in the paper (https://arxiv.org/pdf/1412.6980.pdf) are replaced with 1-beta1 and 1-beta2 when updating the exp averages m_t and v_t, but not when computing the update lr_t. I have reproduced your model in Pytorch using the default Adam implementation and results are comparable. I suspect this is because (1-beta)^t and beta^t vanish exponentially so for large t, having replaced 1 - (1-beta)^t with 1 - beta^t will not change much. Do you have any other ideas why?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Adam implementation #82

Issue with Adam implementation #82

ana-mariacretu commented Jul 5, 2018 •

edited

Loading

Issue with Adam implementation #82

Issue with Adam implementation #82

Comments

ana-mariacretu commented Jul 5, 2018 • edited Loading

ana-mariacretu commented Jul 5, 2018 •

edited

Loading