-
Notifications
You must be signed in to change notification settings - Fork 6
Using an Alternative Aligner
Any SV caller relies fundamentally on the correctness of the aligner's alignments output. If an aligner gives an suboptimal alignment, the deduced SV(s) will be meaningless.
Picky uses LAST to generate all possible High-scoring Segment Pairs (HSPs) for a read, and 'pick'-and-stitch the segments into the representative alignments with a greedy algorithm instead of using last-split. Picky's picking does not assume colinearity nor mandate that each read base must only be aligned to at most a single genomic location.
LAST was motivated for rapid genome-to-genome alignment. Think an extremely-ultra-long read vs another extremely-ultra-long read. Although it does the job exceptionally well, it has not tapped in the raw speed available with each new computing techonology cycles that we are seeing amongst the new breed of long-read aligners such as NGMLR (which is fast) and Minimap2 (which is blazing fast).
To allow users to work with their favorite and most familiar aligner, we have added Picky sam2align. The latter transforms a .sam stream to .align format which Picky callSV consumes to perform the structural vairants calling.
Here is the content of the Picky script LongRead.sh with added step numbering from the QuickStart guide which uses LAST for alignment and Picky selectRep to select the representative alignment. (See STEP 1 reads alignments in the following code block.)
# STEP 0.1 general installation
export LASTAL=last-755/src/lastal
export PICKY=./picky.pl
# STEP 0.2 general configuration
export LASTALDB=hg19.lastdb
export LASTALDBFASTA=hg19.fa
export NTHREADS=4
# STEP 0.3 FASTQ file
export RUN=LongRead
# STEP 1 reads alignments
time (${LASTAL} -C2 -K2 -r1 -q3 -a2 -b1 -v -v -P${NTHREADS} -Q1 ${LASTALDB} ${RUN}.fastq 2>${RUN}.lastal.log | ${PICKY} selectRep --thread ${NTHREADS} --preload 6 1>${RUN}.align 2>${RUN}.selectRep.log)
# STEP 2 call SVs
time (cat ${RUN}.align | ${PICKY} callSV --oprefix ${RUN} --fastq ${RUN}.fastq --genome ${LASTALDBFASTA} --exclude=chrM --sam)
# STEP 3 generate VCF format
${PICKY} xls2vcf --xls ${RUN}.profile.DEL.xls --xls ${RUN}.profile.INS.xls --xls ${RUN}.profile.INDEL.xls --xls ${RUN}.profile.INV.xls --xls ${RUN}.profile.TTLC.xls --xls ${RUN}.profile.TDSR.xls --xls ${RUN}.profile.TDC.xls > ${RUN}.allsv.vcf
To try out NGMLR, you essentially update the setup in STEP 0.1, the alignment instruction in STEP 1, and finally transformation of the NGMLR generated .sam file to .align for picky callSV in STEP 2. All the rest of the script remains unchanged.
NOTE: We skip the first headline '@HD' as Picky sam2align requires the provided .sam content to be query name sorted. In addition, we omit "--fastq ${RUN}.fastq", "--genome ${LASTALDBFASTA}", and "--sam" as there is no need to generate a sam when we have one to begin with.
# STEP 0.1 general installation - CHANGED
export NGMLR=./ngmlr
export PICKY=./picky.pl
# STEP 0.2 general configuration
export LASTALDBFASTA=hg19.fa
export NTHREADS=4
# STEP 0.3 FASTQ file
export RUN=LongRead
# STEP 1 reads alignments - CHANGED
time (${NGMLR} -t ${NTHREADS} -r ${LASTALDBFASTA} -q ${RUN}.fastq -o ${RUN}.ngmlr.sam -x ont)
# STEP 2 call SVs - CHANGED
time (tail -n +2 ${RUN}.ngmlr.sam | ${PICKY} sam2align | ${PICKY} callSV --oprefix ${RUN} --exclude=chrM)
# STEP 3 generate VCF format
${PICKY} xls2vcf --xls ${RUN}.profile.DEL.xls --xls ${RUN}.profile.INS.xls --xls ${RUN}.profile.INDEL.xls --xls ${RUN}.profile.INV.xls --xls ${RUN}.profile.TTLC.xls --xls ${RUN}.profile.TDSR.xls --xls ${RUN}.profile.TDC.xls > ${RUN}.allsv.vcf
To try out Minimap2, you essentially update the setup in STEP 0.1, the alignment instruction in STEP 1, and finally the transformation of the Minimap2 generated .sam file to .align for picky callSV in STEP 2. All the rest of the script remains unchanged.
NOTE: There is no '@HD' in Minimap generated .sam and the content is already query-blocked. This meets the minimal requirement for Picky sam2align. In addition, we omit "--fastq ${RUN}.fastq", "--genome ${LASTALDBFASTA}", and "--sam" as there is no need to generate a sam when we have one to begin with.
# STEP 0.1 general installation - CHANGED
export MINIMAP2=./minimap2
export PICKY=./picky.pl
# STEP 0.2 general configuration
export LASTALDBFASTA=hg19.fa
export NTHREADS=4
# STEP 0.3 FASTQ file
export RUN=LongRead
# STEP 1 reads alignments - CHANGED
time (${MINIMAP2} -t ${NTHREADS} -ax map-ont ${LASTALDBFASTA} ${RUN}.fastq > ${RUN}.minimap2.sam)
# STEP 2 call SVs - CHANGED
time (cat ${RUN}.minimap2.sam | ${PICKY} sam2align | ${PICKY} callSV --oprefix ${RUN} --exclude=chrM)
# STEP 3 generate VCF format
${PICKY} xls2vcf --xls ${RUN}.profile.DEL.xls --xls ${RUN}.profile.INS.xls --xls ${RUN}.profile.INDEL.xls --xls ${RUN}.profile.INV.xls --xls ${RUN}.profile.TTLC.xls --xls ${RUN}.profile.TDSR.xls --xls ${RUN}.profile.TDC.xls > ${RUN}.allsv.vcf