-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusual large memory needed for very large sized coupled BNDs #232
Comments
Hi @jamigo, It's nice to see that AnnotSV is useful and applied on thousands of samples. And thank you for your detailed feedback. In your VCF DRAGEN SV input file, I assume your BNDs are annotated with square-bracketed notations in ALT and that reciprocal BNDs (MATEID) are indicated. Anyway, AnnotSV seems in difficulty because of these BND pairs, corresponding to large SVs.
=> By analyzing only the BNDs and not the SV in its entire width, this should avoid bugging AnnotSV. In the futur, I'm thinking about integrating/using a database in AnnotSV code (#15). This should fix the bug. I'll keep you posted here. Best, Véronique |
Note for square-bracketed ALT notation: The comprehension of the square-bracketed notations relies on the homogenization rules from the variantextractor tool (provided by Rodrigo Martin).
|
AFAIK, AnnotSV is handling square-bracketed SV notation perfectly fine. The problem we have only happens when AnnotSV deals with several thousands of very-large-same-chromosome-paired BNDs (some >20M, some even >200M), maybe because AnnotSV is not releasing memory when it has to (this one could be checked), or maybe because the underlying bedtools calls demand lots of memory (this one would be difficult to address). We were in fact thinking about leaving out all these very-large-same-chromosome-paired BNDs found in this "complicated" region, since they are obviously derived from a reference genome feature rather than from each sample's features, but changing the square-bracketed SV notation to For the record, the "complicated" region that seems to have high homology across the same chromosome and even on other chromosomes through the genome is chr2:32916100-32916600 (hg38). |
We are currenly studying the SVs contained in 9000+ samples and detected by DRAGEN. We are trying to join them altogether and to annotate them in a single step, an approach that has worked for us in other projects, but we are facing an issue that we don't know how to deal with. This is how we proceeded:
We are therefore studying what is happening on this 500bp chr2 region, but at the same time we would like AnnotSV to help us, and it just can't: AnnotSV breaks when dealing with these several thousands of very large coupled BNDs.
We can process all SVs from all chromosomes plus chr2's except the BNDs on that critical region with only 8GB/chromosome, but we tried annotating these 11000+ coupled BNDs, where 4000+ of them are 20M-30M and 3000+ of them are 200M-210M, and 64GB was not enough.
input.annotated.tsv.tmp ends up being 4.3GB before AnnotSV dies.
input.SV_RE_intersect.tmp ends up being 13.6GB before AnnotSV dies.
Before trying to solve this problem increasing the memory by trial and error, we would like to know if there's anything else we can do on our side to workaround this issue,
The text was updated successfully, but these errors were encountered: