Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploding grad SFT on Mac m3 pro #139

Open
Bar-A-94 opened this issue Dec 22, 2024 · 2 comments
Open

Exploding grad SFT on Mac m3 pro #139

Bar-A-94 opened this issue Dec 22, 2024 · 2 comments

Comments

@Bar-A-94
Copy link

hey great community!
i have a problem in SFT notebook.

trying to run it as is (without changing a thing)
google collab, working great!
mac pro m3, getting weird loss and nan for grad_norm after few iterations.
I suspect that there is exploding gradient.

Tried already use_mps=False,
one thing that work only on simple data set - per_device_eval_batch_size=4 and not using the mps.
didn't work on more complicated data while working on Google Colab without any change.
versions:
Torch 2.5.1
Datasets 3.2.0
Trl 0.13.0
Transformers 4.47.0

image

@calvdee
Copy link

calvdee commented Jan 7, 2025

@Bar-A-94 how did you get the environment setup on your M3 given that bitsandbytes does not currently support Applie Sillicon? Did you just drop the dependency?

@Bar-A-94
Copy link
Author

Bar-A-94 commented Jan 8, 2025

well it didn't ask me about it at all. it let me run it freely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants