You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appears to be a memory leak during the validation routine since the GPU I'm using (A40, 48 GB VRAM) runs out of memory from training on the predict flu task for 125 epochs.
The text was updated successfully, but these errors were encountered:
We discussed this on slack, but just wanted to confirm that this is a known bug. I believe there's a memory leak somewhere in on_train_epoch_end but I haven't been able to find it.
This isn't a complete solution, but I think the memory leak is happening in the metric bootstrapping during the validation loops. I can't figure out where specifically it's going down, but setting --model.val_bootstraps=0 after this commit (370d41f) should stop the leak.
There appears to be a memory leak during the validation routine since the GPU I'm using (A40, 48 GB VRAM) runs out of memory from training on the
predict flu
task for 125 epochs.The text was updated successfully, but these errors were encountered: