Performance: Sparse adam #16

ArthurBrussee · 2024-11-11T11:57:02Z

Gradients are already calculated as sparse gradients. As a very final step, these are then made dense again and fed to the optimizer.

We'd ideally have a sparse adam optimizer. It'd be great if this was a re-usable library for Burn, but more likely it'd mean just implementing this as a custom kernel.

ArthurBrussee · 2024-11-16T12:01:22Z

After some discussions, I think #38 is the more fruitful direction. Sparse adam is not a big win as momenta will be dense anyway. Taking multiple images/step will also make the gradients more "dense"

ArthurBrussee mentioned this issue Nov 14, 2024

Adam surgery to keep momenta #30

Closed

ArthurBrussee closed this as completed Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Sparse adam #16

Performance: Sparse adam #16

ArthurBrussee commented Nov 11, 2024

ArthurBrussee commented Nov 16, 2024

Performance: Sparse adam #16

Performance: Sparse adam #16

Comments

ArthurBrussee commented Nov 11, 2024

ArthurBrussee commented Nov 16, 2024