You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I want to wish you a Merry Christmas and thank you for this wonderful Christmas gift! Amazing work! I noticed that when resuming training from a checkpoint, the continuation uses the directory specified in the train.json file, but it’s not necessarily the correct one, as the directory might actually be directory_+_timestamp.
The text was updated successfully, but these errors were encountered:
Another small bug I noted is the duration in the checkpoint.json can sometimes be wrong when starting and resuming training with some hours in-between runs, for example:
Train for 1h
Pause for 1h
Train for 1h more
this will report a 3h duration, when it should be 2h.
This happens since we only log the start time, but it should probably work more like a stopwatch since training can be assumed to start and stop at arbitrary times.
Yes, I noticed it too. I started a training session in the evening and resumed it the next day, and the duration was indeed gross. I didn’t perceive it as an “issue.” In my fork, I shared the file plotter.py. If you feel like it, take a look—it’s my first approach to Python. I relied heavily on AIs for help, but I find it “fun.”
First of all, I want to wish you a Merry Christmas and thank you for this wonderful Christmas gift! Amazing work! I noticed that when resuming training from a checkpoint, the continuation uses the directory specified in the train.json file, but it’s not necessarily the correct one, as the directory might actually be directory_+_timestamp.
The text was updated successfully, but these errors were encountered: