Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sizes of successful CNVs - smaller than expected #166

Open
RichardCorbett opened this issue Sep 1, 2020 · 5 comments
Open

sizes of successful CNVs - smaller than expected #166

RichardCorbett opened this issue Sep 1, 2020 · 5 comments

Comments

@RichardCorbett
Copy link

Hi there,
I have been using bamsurgeon to simulate germline copy number changes. I am inserting deletions and duplications from size ranges of 100bp to 10Mb.

One of my target sizes is 5Kb and when I went to check how many of my target variants were successfully integrated I found that most of those that were successful were smaller than the desired event, usually resulting in a deletion of 1kb to 3kb.

This led me to do a test across a range of sizes to see if there was a size "hump" I needed to get over.

I created lists of ~100 homozygous deletions at each target size of 1000,2000....10Kb and tested to see what the size distribution of the successfully integrated variants would be.

image

Here are some example lines of the variants I am attempting to integrate for the 8Kb tests.

1 5048109 5056109 DEL 1
1 21392935 21400935 DEL 1
1 72189128 72197128 DEL 1
1 77025708 77033708 DEL 1

It looks like there is a limit in this range capping the sizes of target events around 2800bp. Is there a way to get around this?

thanks,
Richard

@adamewing
Copy link
Owner

Hi Richard, Sorry to hear you're having trouble - that does look strange. Could you try exchanging "DEL" for "BIGDEL" in the mutation input file and let me know how you go?

@RichardCorbett
Copy link
Author

Thanks @adamewing,
I'm trying the same test, but this time using only "BIGDEL" events. For the sets with events of sizes 1Kb-4Kb I get an error at the beginning of the run after I get a warning for each of my events:

WARNING 2020-09-02 08:29:33,818 Y 40543929 40547929 BIGDEL 1 is under 5kbp, "BIG" mutation types will yield unpredictable results, converting to DEL
WARNING 2020-09-02 08:29:33,819 Y 40901475 40905475 BIGDEL 1 is under 5kbp, "BIG" mutation types will yield unpredictable results, converting to DEL
WARNING 2020-09-02 08:29:33,819 Y 49701495 49705495 BIGDEL 1 is under 5kbp, "BIG" mutation types will yield unpredictable results, converting to DEL
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 550, in makemut
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/addsv.py", line 4, in <module>
    __import__('pkg_resources').run_script('bamsurgeon==1.2', 'addsv.py')
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 654, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1441, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 1358, in <module>
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 1121, in main
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
IndexError: list index out of range

For the set of 5k events it does seem to fire up ok, but appears to fail when trying to insert the variants:

INFO 2020-09-02 08:38:00,539 X_28368363_28368363_BIGDEL removing addsv.tmp/X_28368363_28368363_BIGDEL.wgsimtmp.1df84ab2-e7d5-4acc-9afe-8dfa0a68d67b.2.fq
INFO 2020-09-02 08:38:00,549 X_28368363_28368363_BIGDEL temporary bam: addsv.tmp/X_28368363_28368363_BIGDEL.0cba3dc8-7c31-4548-b461-35bc6f806757.muts.bam
INFO 2020-09-02 08:38:01,144 5_8744659_8744659_BIGDEL best contig length: 2184
INFO 2020-09-02 08:38:01,144 5_8744659_8744659_BIGDEL best transloc contig length: 8676
INFO 2020-09-02 08:38:01,207 5_8744659_8744659_BIGDEL alignment result: ['SUMMARY', '8841', '349', '2128', '2221', '4000']
INFO 2020-09-02 08:38:01,209 5_8744659_8744659_BIGDEL trimmed contig length: 1779
INFO 2020-09-02 08:38:01,209 5_8744659_8744659_BIGDEL start: 8742659, end: 8746659, tgtstart: 2221, tgtend: 4000, refstart: 8744880, refend: 8746659
INFO 2020-09-02 08:38:01,286 5_8744659_8744659_BIGDEL alignment result: ['SUMMARY', '29838', '2574', '8547', '2027', '8000']
INFO 2020-09-02 08:38:01,291 5_8744659_8744659_BIGDEL trimmed contig length: 5973
INFO 2020-09-02 08:38:01,291 5_8744659_8744659_BIGDEL trn_start: 8745659, trn_end: 8753659, trn_tgtstart: 2027, trn_tgtend:8000 , trn_refstart: 8747686, trn_refend: 8753659
WARNING 2020-09-02 08:38:01,292 5_8744659_8744659_BIGDEL best contig too short to make mutation!
Traceback (most recent call last):
  File "/usr/local/bin/addsv.py", line 4, in <module>
    __import__('pkg_resources').run_script('bamsurgeon==1.2', 'addsv.py')
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 654, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1441, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 1358, in <module>
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 1156, in main
  File "/usr/local/lib/python3.6/site-packages/bamsurgeon-1.2-py3.6.egg/EGG-INFO/scripts/addsv.py", line 460, in fetch_read_names
  File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 690, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (37330220) > stop (37329220)

For the events that are 6Kb-9Kb, they are still running. Hopefully at the end of the day I'll see if the BIGDEL parameter helped with those.

@RichardCorbett
Copy link
Author

For the events that were 6kb-9kb, using BIGDEL seems to have fixed the issue.
image

@adamewing
Copy link
Owner

OK, thanks for the analysis. The sizes come from the truth VCF, right? The intended behaviour is for addsv to switch to the "bigdel" method automatically when the input target is > 5kbp. Still unclear why you're hitting a limit, will investigate.

@RichardCorbett
Copy link
Author

Yes, the SV sizes i am pulling out are coming from the SVLEN tag in the created VCF files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants