Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bottleneck in bwa_sam_PE #15

Open
biomystery opened this issue May 2, 2018 · 3 comments
Open

bottleneck in bwa_sam_PE #15

biomystery opened this issue May 2, 2018 · 3 comments

Comments

@biomystery
Copy link
Contributor

taking long time to generate: /home/zhc268/scratch/outputs/AVD_146/align/rep1/AVD_146_R1.PE2SE.bam

for 1g size fastq.gz.

bwa sampe /projects/ps-epigen/GENOME/hg38/bwa_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta /home/zhc268/scratch/outputs/AVD_146/align/rep1/AVD_146_R1.sai /home/zhc268/scratch/outputs/AVD_146/align/rep1/AVD_146_R2.sai /home/zhc268/data/seqdata/AVD_146_R1.fastq.gz /home/zhc268/data/seqdata/AVD_146_R2.fastq.gz | pigz -p 5 -nc > /home/zhc268/scratch/outputs/AVD_146/align/rep1/AVD_146_R1.PE2SE.sam.gz

@biomystery
Copy link
Contributor Author

biomystery commented May 3, 2018

typical output:
[bwa_read_seq] 0.0% bases are trimmed.
[bwa_read_seq] 0.0% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (148, 178, 222)
[infer_isize] low and high boundaries: 36 and 370 for estimating avg and std
[infer_isize] inferred external isize from 114564 pairs: 186.963 +/- 54.142
[infer_isize] skewness: 0.998; kurtosis: 0.757; ap_prior: 3.38e-05
[infer_isize] inferred maximum insert size: 564 (6.96 sigma)
[bwa_sai2sam_pe_core] time elapses: 1305.81 sec
[bwa_sai2sam_pe_core] changing coordinates of 32658 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 2136 out of 4067 Q17 singletons are mated.
[bwa_paired_sw] 1457 out of 2864 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 1.00 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.21 sec
[bwa_sai2sam_pe_core] print alignments... 1.14 sec
[bwa_sai2sam_pe_core] 262144 sequences have been processed.
[bwa_read_seq] 0.0% bases are trimmed.
[bwa_read_seq] 0.0% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (148, 178, 221)
[infer_isize] low and high boundaries: 36 and 367 for estimating avg and std
[infer_isize] inferred external isize from 113939 pairs: 186.838 +/- 53.827
[infer_isize] skewness: 0.978; kurtosis: 0.681; ap_prior: 2.99e-05
[infer_isize] inferred maximum insert size: 561 (6.96 sigma)
[bwa_sai2sam_pe_core] time elapses: 1394.13 sec
[bwa_sai2sam_pe_core] changing coordinates of 32672 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 1981 out of 3913 Q17 singletons are mated.
[bwa_paired_sw] 1574 out of 2937 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 0.42 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.17 sec
[bwa_sai2sam_pe_core] print alignments... 1.11 sec
[bwa_sai2sam_pe_core] 524288 sequences have been processed.

@biomystery
Copy link
Contributor Author

$ grep time ./tmp.txt | cut -d':' -f 2| awk ' {sum+=$1} ;END {print sum/3600}'
7.49306 hr

@biomystery
Copy link
Contributor Author

http://bio-bwa.sourceforge.net/bwa.shtml

Pairing is slower for shorter reads. This is mainly because shorter reads have more spurious hits and converting SA coordinates to chromosomal coordinates are very costly.
In this case, the read length is 36 bp - small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant