Exploding grad SFT on Mac m3 pro #139

Bar-A-94 · 2024-12-22T19:45:23Z

hey great community!
i have a problem in SFT notebook.

trying to run it as is (without changing a thing)
google collab, working great!
mac pro m3, getting weird loss and nan for grad_norm after few iterations.
I suspect that there is exploding gradient.

Tried already use_mps=False,
one thing that work only on simple data set - per_device_eval_batch_size=4 and not using the mps.
didn't work on more complicated data while working on Google Colab without any change.
versions:
Torch 2.5.1
Datasets 3.2.0
Trl 0.13.0
Transformers 4.47.0

calvdee · 2025-01-07T15:56:29Z

@Bar-A-94 how did you get the environment setup on your M3 given that bitsandbytes does not currently support Applie Sillicon? Did you just drop the dependency?

Bar-A-94 · 2025-01-08T09:02:27Z

well it didn't ask me about it at all. it let me run it freely

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploding grad SFT on Mac m3 pro #139

Exploding grad SFT on Mac m3 pro #139

Bar-A-94 commented Dec 22, 2024

calvdee commented Jan 7, 2025 •

edited

Loading

Bar-A-94 commented Jan 8, 2025

Exploding grad SFT on Mac m3 pro #139

Exploding grad SFT on Mac m3 pro #139

Comments

Bar-A-94 commented Dec 22, 2024

calvdee commented Jan 7, 2025 • edited Loading

Bar-A-94 commented Jan 8, 2025

calvdee commented Jan 7, 2025 •

edited

Loading