-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save_steps not working , checkpoint getting generated on every epoch morethan once #1542
Comments
@guruprasaad123 I am wondering whether this is correct ?
Basically So if you do the calculation with your Above screenshot is from |
@DamithDR i believe not , because i have taken the code directly from the official documentation itself and produced the results with this code below :
Sourcesimpletransformers | Tips and Tricks Results :ConclusionAs i have ran the training script for 4 epochs , its expected to create checkpoint after the 4th epoch only which isnt the case here. Please let me know if i am wrong |
Describe the bug
we are trying to save the model's checkpoint not too frequent cause we are running low on storage. So, our idea is that to save the model's checkpoint every 4/5 epochs. Kindly check the below code used to accomplish this mentioned task.
To Reproduce
Steps to reproduce the behavior:
python3 train.py
to run the codeExpected behavior
we were expecting that the process would generate checkpoint only after the 4th epoch, 8th epoch etc. But its generating checkpoints on every epoch and that too every 2000 steps.
Screenshots
the process generates these many checkpoint like below :
outputs
|----checkpoint-4000
|----checkpoint-10000
|----best_model
|----checkpoint-5063-epoch-1
|----checkpoint-6000
|----checkpoint-10126-epoch-2
|----checkpoint-8000
|----checkpoint-2000
|----eval_results.txt
|----training_progress_scores.csv
Desktop (please complete the following information):
Additional context
None
The text was updated successfully, but these errors were encountered: