You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was there any rationale behind early stopping, like MLM accuracy or something or it was just random? I am asking because I wanted to know how much training data is enough training data for MLM, especially for high resource languages like en?
For language SFT training for english, did you use the entire wikipedia or just a subset of it?
The text was updated successfully, but these errors were encountered: