You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the number of cp data files is required to be 16. Is there a reason this is hard coded? When running jobs with very large memory requirements it makes sense to use fewer nodes since memory is shares across nodes but this results in jobs starting as new each time they are checkpointed because the directory check returns NULL. Perhaps setting this to check the number of nodes assigned to the job would make more sense.
The text was updated successfully, but these errors were encountered:
this was hard-coded when all the nodes on Hyak were 16-core. we haven't had to use check pointing for years due to tergmLite, so I haven't needed to update this. feel free to contribute a PR if you would like to generalize this.
In the line here"
https://github.com/statnet/EpiModelHPC/blob/9491c5e5de29616b824899fa552decddc685150d/R/check_cp.R#L44
the number of cp data files is required to be 16. Is there a reason this is hard coded? When running jobs with very large memory requirements it makes sense to use fewer nodes since memory is shares across nodes but this results in jobs starting as new each time they are checkpointed because the directory check returns NULL. Perhaps setting this to check the number of nodes assigned to the job would make more sense.
The text was updated successfully, but these errors were encountered: