You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hey great community!
i have a problem in SFT notebook.
trying to run it as is (without changing a thing)
google collab, working great!
mac pro m3, getting weird loss and nan for grad_norm after few iterations.
I suspect that there is exploding gradient.
Tried already use_mps=False,
one thing that work only on simple data set - per_device_eval_batch_size=4 and not using the mps.
didn't work on more complicated data while working on Google Colab without any change.
versions:
Torch 2.5.1
Datasets 3.2.0
Trl 0.13.0
Transformers 4.47.0
The text was updated successfully, but these errors were encountered:
hey great community!
i have a problem in SFT notebook.
trying to run it as is (without changing a thing)
google collab, working great!
mac pro m3, getting weird loss and nan for grad_norm after few iterations.
I suspect that there is exploding gradient.
Tried already use_mps=False,
one thing that work only on simple data set - per_device_eval_batch_size=4 and not using the mps.
didn't work on more complicated data while working on Google Colab without any change.
versions:
Torch 2.5.1
Datasets 3.2.0
Trl 0.13.0
Transformers 4.47.0
The text was updated successfully, but these errors were encountered: