-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory peak #9
Comments
Hi @dcopetti, Usually, the most memory intensive steps are (1) the implementation of the Forward-Backward algorithm (i.e., the correction step) and (2) even loading FAI file to a memory (e.g., "too" large short read files). For the first case, allocation of multi-dimensional arrays creates an overhead during the Forward-Backward calculation (per read). Since the correction of each long read is handled in parallel, memory overhead would increase as the number of threads allocated for Hercules increases. I believe there are many possibilities for the optimization in how Forward-Backward values are stored per read, and calculation of the Viterbi algorithm, which hopefully will be released in the newer versions of Hercules. Thanks. |
Hi, Thanks for the details. |
Hi again @dcopetti , Yes that could be one option as long as you also divide an alignment file so that each sub-alignment file includes the alignment of short reads only to the long reads that will be corrected within the same run. Thanks. |
Hi, |
It will most probably cause an error. Before answering your question let me describe how it works a little bit further. This is related to how Hercules generates the preprocessing files and uses the alignment file later on the correction step ( In the case of your question, if an alignment file includes a read id (must be an integer value) that actually is not present in your set of long reads, it will not cause any problem iff this read id is greater than or equal to the number of reads included in the long read set. Otherwise, it will definitely point to a read and it will try to correct that read using the short reads aligning to that read id. Thus, you should always keep the order of the reads in mind and if you change the order of a read, you should also change REF field accordingly so that the values in the REF field still point to correct long reads. The code snippet that makes the assumptions that I just described is as follows: Note that Hercules reads long reads from the beginning by keeping track of the order of the reads and checks whether the REF field in the next alignment record matches with the long read that is currently being processed. Lines 992 to 1009 in e0a7b39
|
I see. |
I agree, it is better to split the reads before These were valuable feedbacks for us as well. I will consider making the parameters more intuitive so that one can run Hercules on a partial set of reads instead of using the whole read sets in order to prevent possible overflows. Thanks. |
Hello,
From the paper, I see that Hercules needs lots of memory when correcting WGS data. Which is the step that is more memory-intensive? Is it maybe related to the sort step - that would be easier to adapt to smaller computational resources.
Thanks
The text was updated successfully, but these errors were encountered: