-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Bad allocation warning #113
Comments
It's working fine for me with the latest version of Sire.
|
(Note that I shortened the number of moves for debugging purposes.) |
Hi Lester, Can I just ask if you used the SLURM script to run it or did you run it from command line? I will also try it with the latest version. |
I just ran directly from the command line using the |
It may be worth checking the amount of memory consumed by your job on your cluster, or trying to run the job again asking for more memory from the scheduler. |
Have you been able to see if this was caused by running out of memory? |
Hi @chryswoods, sorry for not updating you sooner, I've been waiting for some of the calculations to run to see if the error was still happening. What I did was to check my slurm configruations and after making sure that wasn't an issue, I increased the number of cycles to 20 (rather than 5, which is what I chose to increase efficiency) and that seems to have fixed the bad allocation error. The only problem with is is that it has increased my computation time by almost a factor of two. I was wondering if it would make sense to raise the bad allocation as an actual error rather than a UserWarning? |
Also realised that my Python is 3.10 which I believe should be an older version, going by the installation instructions. I will install the correct version and see if this also fixes things. |
Thanks - yes, I agree that the naming of I'll look to translate a Your Python 3.10 is fine. We support Python versions 3.9, 3.10 and 3.11. The version of python shouldn't have any impact on this bug. We aim to support the last three major releases of Python, i.e. next year we will transition to supporting 3.10, 3.11 and 3.12. |
Thanks that would be really helpful! |
Hi, just as an update, I tried rerunning the runs on a different computer (this time using a HPC) but I got the following
I spoke to Anna Herz and she suggested to change the ncycles to be 4 so that there is one cycle per nanosecond of simulation. I'm currently testing this out both on my workstation and on a HPC. |
I've just tried it again with the below
|
If I'm reading this right you are saving 10000 trajectory frames (500000 x 4 / 200), with 2500 buffered in memory each cycle. (I think it just stores the coordinates.) Not sure if this is causing the memory to overflow. |
@lohedges so would you suggest to increase the buffered coordinates frequency? If I understand correctly that would decrease the amount of memory stored in each cycle? I will try again with that and also increasing the number of cycles to 8. |
Well, I'd just try not saving any frames, or a minimal number. That would be an easy way to test whether it's this part of the code that's causing the problem. |
I've tried again with the following config on my workstation:
And got the
|
Could you set You could also add
to tell somd to only save coordinates for lambda=0 and lambda=1. I have (in another branch that will be merged into 2023.5.0) changed the warning into an error that says that the code is using too much memory. |
Just to add that PR #134 includes a fix for the wrapping of |
Hi, sorry again for the long wait. I ran the run with the below config options on the HPC and still get the I did set
|
Yes, you could try completely disabling coordinate saving by turning off the Are you able to control how much memory you are requesting on your cluster via your slurm script? How much are you requesting? I would expect you to need at least 4 GB - 16 GB to be able to run this job. I took a look at your slurm script and I couldn't see where you are requesting the amount of memory you want to use. I know on some clusters, if you don't specify, then you can end up with very small amounts, e.g. 1-2 GB. |
Were you able to run the job on your cluster? Let us know if everything is ok. We will automatically close the issue if there's no update by the end of the month. |
Closing due to inactivity - please feel free to reopen if you still need help. |
Describe the bug
I'm running
somd-freenrg
with a slurm submission script and my runs seem to only do 1 cycle (according to thesomd.out
). Looking atsomd.err
I only get a UserWarning and slurm does not produce any errors.To Reproduce
I've attached one of my lambda run directories to the issue with all my files.
$BSS_HOME/somd-freenrg -C ./somd.cfg -l $lambda -c ./somd.rst7 -t ./somd.prm7 -m ./somd.pert -p CUDA 1> ./somd.out 2> ./somd.err
Expected behavior
All cycles should finish normally, and
somd.err
should not raise thebad_alloc
warning.Input files
scripts_for_issue.zip
issue.zip
(please complete the following information):
The text was updated successfully, but these errors were encountered: