-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory requirement formula correction #27
Comments
Apparently, the actual GPU memory needed is still larger than indicated, leading to a crash with CUDA error 2 (out of memory).
When further reducing alignment sizes so memory consumption stops being a problem, large MSAs still cause crashes with CUDA error 77 (illegal memory access) as shown in the example below:
Since apparently this has not been a common occurrence in the past, I assume the very large alignment is causing the issue. I'll try to investigate and will report back if I find the problem in the CUDA kernels. |
Hi, I wonder if this is related to #34? Just opened it and I'm curious if you found a solution. |
While reducing the alignment sizes of my current dataset in order to be able to compute couplings on the GPU, I noticed a large discrepancy between results from the formula in the README and the actual RAM needed when running CCMpred.
I know that CCMpred is no longer actively maintained, but in order to help fellow researches running into the same issue, here is the corrected formula based on the calculation in the source code (ccmpred.c, lines 437-441):
Padded: 4* (4* (L * L * 32 * 21 + L * 20) + N * L * 2 + N * L * 32 + N) + 2 * N * L
Unpadded: 4* (4* (L * L * 21 * 21 + L * 20) + N * L * 2 + N * L * 21 + N) + 2 * N * L
The internal
size_t mem_needed
is however only used for the output part, the actual allocation happens separately for a variety of different memory blocks. I'll do some further testing with samples calculated to barely fit into GPU memory to see if the CUDA allocations are equivalent.The text was updated successfully, but these errors were encountered: