Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bamsurgeon addsnv.py takes 4 hours to add one SNP to a bam file of size 3.1MB ? #225

Open
TrangNg-Th opened this issue May 14, 2024 · 0 comments

Comments

@TrangNg-Th
Copy link

Hi,
I recently used Bamsurgeon to add an SNP at a specific location for a specific region of the bam file. I processed the bam file and extracted a smaller bam file that contains only a small region around the SNP that I'm interested in. However, upon running the script, bamsurgeon addsnp.py takes around 4 hours for one last step: "merging reads".
My question is: What does this merging step do, and is this step important for the final modified bam file?

For clarification, here was the script that I used:
singularity exec
--bind $extracted_reads_dir:/extracted_reads
--bind $ref_fasta_dir:/ref_Macaq
--bind $input_snp_dir:/input_snps
--bind $output_snp_dir:/output_snp
--bind $tmp_dir:/tmp
bamsurgeon.sif python3
-O bamsurgeon/bin/addsnv.py
--procs 1
-v /input_snps/$test_snv
-f /extracted_reads/$filename
-r /ref_Macaq/modif_ref-relabeled.fna
--vcf /tmp/${sample}${chromosome}${start_pos}-${end_pos}modif.vcf
-o /output_snp/${sample}
${chromosome}_${start_pos}-${end_pos}_modif.bam
--tmpdir /tmp
--aligner mem
--seed 1234
--mindepth 3
--coverdiff 0.1
--ignoreref
--picardjar /picard.jar

The messages given by bamsurgeon:
16:25:16.194 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat May 11 16:25:16 EDT 2024] SamToFastq INPUT=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.bam FASTQ=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq SECOND_END_FASTQ=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat May 11 16:25:16 EDT 2024] Executing as ****** on Linux 4.18.0-513.9.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.22+7-post-Ubuntu-0ubuntu220.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.3
[Sat May 11 16:25:17 EDT 2024] picard.sam.SamToFastq done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=5821693952
INFO 2024-05-11 16:25:17,766 haplo_chr10_10571334_10571334 aligning /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq,/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq with mem
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 76 sequences (11476 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 38, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (380, 434, 479)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (182, 677)
[M::mem_pestat] mean and std.dev: (432.43, 74.17)
[M::mem_pestat] low and high boundaries for proper pairs: (83, 776)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 76 reads in 0.039 CPU sec, 0.041 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 1 -M -Y /ref_Macaq/modif_ref-relabeled.fna /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq
[main] Real time: 5.075 sec; CPU: 3.588 sec
INFO 2024-05-11 16:25:23,580 haplo_chr10_10571334_10571334 avgincover: 38.000000, avgoutcover: 38.000000
INFO 2024-05-11 16:25:23,753 done making mutations, merging mutations into /extracted_reads/39237_chr10_10566334-10576334_sampled.bam --> /output_snp/39237_chr10_10566334-10576334_modif.bam
INFO 2024-05-11 20:30:28,614 loading donor reads into dictionary...
INFO 2024-05-11 20:30:28,652 secondary reads count:0
INFO 2024-05-11 20:30:28,652 supplementary reads count:0
INFO 2024-05-11 20:30:28,653 loaded 76 reads, (0 excluded, 0 null or secondary or supplementary--> ignored)

INFO 2024-05-11 20:30:28,746 replaced 76 reads (0 excluded )

INFO 2024-05-11 20:30:28,746 kept 0 secondary reads.

INFO 2024-05-11 20:30:28,747 kept 0 supplementary reads.

INFO 2024-05-11 20:30:28,747 ignored 0 non-primary reads in target BAM.

INFO 2024-05-11 20:30:29,559 vcf output written to /tmp/39237_chr10_10566334-10576334_modif.vcf39237_chr10_10566334-10576334_modif.addsnv.snp_chr10_10571334.vcf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant