Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.memmap memory leak and correct val sampling #16

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

haeggee
Copy link
Collaborator

@haeggee haeggee commented Jun 18, 2024

  • fix a memory leak for np.memmap (see e.g. here) by constructing the object holding the memmap inside the dataloader whenever you get a new item, not just once outside. in certain cases with multiple GPUs, this has lead to overloading the (shared/virtual) memory and therefore crashes
  • other fixes
    • make val batches inside training loop deterministic by resetting the val loader each time -- this gets rid of the noise of the sampling process for evaluation, so noise only comes from the model parameters
    • fixes the final evaluation on the full validation set, which previously did not always use the same set of batches because the val loader was not reset properly. also, separate the logging for the deterministic val batches (as above) and the full val set
    • get rid of unused and outdated arxiv data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant