You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gradients are already calculated as sparse gradients. As a very final step, these are then made dense again and fed to the optimizer.
We'd ideally have a sparse adam optimizer. It'd be great if this was a re-usable library for Burn, but more likely it'd mean just implementing this as a custom kernel.
The text was updated successfully, but these errors were encountered:
After some discussions, I think #38 is the more fruitful direction. Sparse adam is not a big win as momenta will be dense anyway. Taking multiple images/step will also make the gradients more "dense"
Gradients are already calculated as sparse gradients. As a very final step, these are then made dense again and fed to the optimizer.
We'd ideally have a sparse adam optimizer. It'd be great if this was a re-usable library for Burn, but more likely it'd mean just implementing this as a custom kernel.
The text was updated successfully, but these errors were encountered: