Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added EMA of running weights to the current codebase #853

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

WhenWen
Copy link

@WhenWen WhenWen commented Jan 9, 2025

The syntax works like
'''
--trainer.use_ema True
--trainer.ema_beta 0.995
'''

The current issue is that the current code isn't compatible with the previous version of trainer state.

@WhenWen WhenWen requested a review from dlwh January 9, 2025 06:40
@dlwh
Copy link
Member

dlwh commented Jan 9, 2025

imho the ema stuff should go in the optimizer state, wdyt?

@WhenWen
Copy link
Author

WhenWen commented Jan 9, 2025

Sounds reasonable to me. This should improve compatibility as well. I will try to implement this.

@WhenWen
Copy link
Author

WhenWen commented Jan 12, 2025

After trying the implementation, implementing EMA within optimizers seems to limit the generality of this feature. Currently, I have to define specific optimizers like AdamWithEMA while this optimizer is intrinsically the same as Adam. I feel that perhaps it is a better idea to keep the EMA checkpoint in the trainer state because of this.

@dlwh
Copy link
Member

dlwh commented Jan 14, 2025

i see what you're saying, but like, Optax doesn't really have an "adam" right? It's a bunch of transformations and this is just one more? Let me see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants