Skip to content

Commit

Permalink
version 2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Pombert-JF committed Mar 16, 2021
1 parent a3c8949 commit 67d4d95
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 79 deletions.
18 changes: 7 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ Assessing the genetic diversity between genomes often involves the calculation o
- [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) 2.0+
- [NGMLR](https://github.com/philres/ngmlr) 0.2.7+ (partial support)
- [BCFtools](http://samtools.github.io/bcftools/) 1.3.1+
- [FreeBayes](https://github.com/ekg/freebayes)

#### Downloading from GitHub
```bash
Expand Down Expand Up @@ -223,14 +222,13 @@ Options for [get_SNPs.pl](https://github.com/PombertLab/SSRG/blob/master/get_SNP
-ns (--no_stats) Do not calculate stats; stats can take a while to compute for large eukaryote genomes
# Mapper-specific options
-X BOWTIE2 - Maximum paired ends insert size [default: 750]
-preset MINIMAP2 - Preset: sr, map-ont, map-pb or asm20 [default: sr]
-algo BWA - Mapping algorithm: bwasw, mem, samse [default: mem]
-X BOWTIE2 - Maximum paired ends insert size [default: 750]
### Variant calling options ###
-caller [default: varscan2] ## Variant caller: varscan2, bcftools or freebayes
# Variant calling options
-caller [default: varscan2] ## Variant caller: varscan2 or bcftools
-type [default: snp] ## snp, indel, or both
-ploidy [default: 1] ## FreeBayes/BCFtools option; change ploidy (if needed)
-ploidy [default: 1] ## BCFtools option; change ploidy (if needed)
# VarScan2 parameters - see http://dkoboldt.github.io/varscan/using-varscan.html
-var [default: VarScan.v2.4.4.jar] ## Which varscan jar file to use
Expand Down Expand Up @@ -445,10 +443,8 @@ This work was supported in part by the National Institute of Allergy and Infecti

8. Ondov BD, Treangen TJ, Mallonee AB, Bergman NH, Koren S, Phillippy AM. **Fast genome and metagenome distance estimation using MinHash.** 2016. *Genome Biol.* 17, 132. PMID: [27323842](https://pubmed.ncbi.nlm.nih.gov/27323842) PMCID: [PMC4915045](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915045) DOI: [10.1186/s13059-016-0997-x](https://doi.org/10.1186/s13059-016-0997-x)

9. Garrison E, Marth G. **Haplotype-based variant detection from short-read sequencing.** *arXiv* Prepr. 2012. 9, [arXiv1207.3907](https://arxiv.org/abs/1207.3907).

10. R Core Team. **R: A language and environment for statistical computing.** 2021. [URL](https://www.r-project.org/).
9. R Core Team. **R: A language and environment for statistical computing.** 2021. [URL](https://www.r-project.org/).

11. van der Maaten LJP. **Barnes-Hut-SNE.** *arxiv.org* 2013. [1301.3342v2](https://arxiv.org/abs/1301.3342)
10. van der Maaten LJP. **Barnes-Hut-SNE.** *arxiv.org* 2013. [1301.3342v2](https://arxiv.org/abs/1301.3342)

12. van der Maaten LJP, Hinton GE. **Visualizing High-Dimensional Data Using t-SNE.** *J. Mach. Learn. Res.* 2008. 9, 2579–2605. [URL](https://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)
11. van der Maaten LJP, Hinton GE. **Visualizing High-Dimensional Data Using t-SNE.** *J. Mach. Learn. Res.* 2008. 9, 2579–2605. [URL](https://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)
115 changes: 47 additions & 68 deletions get_SNPs.pl
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/perl
## Pombert JF, Illinois Tech - 2020
my $version = '2.0 alpha RC4';
my $version = '2.0';
my $name = 'get_SNPs.pl';
my $updated = '15/03/2021; Code minification + readability';
my $updated = '16/03/2021';

use strict; use warnings; use File::Basename; use Getopt::Long qw(GetOptions);

Expand All @@ -20,7 +20,7 @@
EXAMPLES:
simple ${name} -fa *.fasta -fq *.fastq -o RESULTS
advanced ${name} --fasta *.fasta --fastq *.fastq --mapper bowtie2 --caller varscan2 --type both --var ./VarScan.v2.4.3.jar --threads 16
paired ends ${name} --fasta *.fasta --pe1 *R1.fastq --pe2 *R2.fastq --X 1000 --mapper bowtie2 --caller freebayes --threads 16
paired ends ${name} --fasta *.fasta --pe1 *R1.fastq --pe2 *R2.fastq --X 1000 --mapper bowtie2 --caller bcftools --threads 16
read-mapping only ${name} --fasta *.fasta --pe1 *R1.fastq --pe2 *R2.fastq --X 1000 --mapper bowtie2 --rmo --bam --threads 16
USAGE
die "\n$usage\n$hint\n" unless@ARGV;
Expand All @@ -32,7 +32,7 @@
-v (--version) Display script version
-o (--outdir) Output directory [Default: ./]
### Mapping options ###
# Mapping options
-fa (--fasta) Reference genome(s) in fasta file
-fq (--fastq) Fastq reads (single ends) to be mapped against reference(s)
-pe1 Fastq reads #1 (paired ends) to be mapped against reference(s)
Expand All @@ -46,17 +46,16 @@
-rmo (--read_mapping_only) Do not perform variant calling; useful when only interested in bam/sam files and/or mapping stats
-ns (--no_stats) Do not calculate stats; stats can take a while to compute for large eukaryote genomes
### Mapper-specific options ###
-X BOWTIE2 - Maximum paired ends insert size [default: 750]
# Mapper-specific options
-preset MINIMAP2 - Preset: sr, map-ont, map-pb or asm20 [default: sr]
-algo BWA - Mapping algorithm: bwasw, mem, samse [default: mem]
-X BOWTIE2 - Maximum paired ends insert size [default: 750]
### Variant calling options ###
-caller [default: varscan2] ## Variant caller: varscan2, bcftools or freebayes
# Variant calling options
-caller [default: varscan2] ## Variant caller: varscan2 or bcftools
-type [default: snp] ## snp, indel, or both
-ploidy [default: 1] ## FreeBayes/BCFtools option; change ploidy (if needed)
-ploidy [default: 1] ## BCFtools option; change ploidy (if needed)
### VarScan2 parameters ### see http://dkoboldt.github.io/varscan/using-varscan.html
# VarScan2 parameters - see http://dkoboldt.github.io/varscan/using-varscan.html
-var [default: VarScan.v2.4.4.jar] ## Which varscan jar file to use
-mc (--min-coverage) [default: 15] ## Minimum read depth at a position to make a call
-mr (--min-reads2) [default: 15] ## Minimum supporting reads at a position to call variants
Expand Down Expand Up @@ -87,9 +86,8 @@
my $nostat;

## Mapper-specific
my $maxins = '750';
my $preset = 'sr';
my $algo = 'mem';
my $maxins = '750';

## Variant calling
my $caller = 'varscan2';
Expand Down Expand Up @@ -126,9 +124,8 @@
'ns|no_stats' => \$nostat,

## Mapper-specific
'X=i' => \$maxins,
'preset=s' => \$preset,
'algo=s' => \$algo,
'X=i' => \$maxins,

## Variant calling
'caller=s' => \$caller,
Expand All @@ -154,9 +151,9 @@
if ($mapper =~ /bowtie2|hisat2|minimap2|ngmlr/){ chkinstall($mapper);}
else {die "\nMapper option $mapper is unrecognized. Please use bowtie2, hisat2, minimap2 or ngmlr...\n\n";}
unless ($rmo){
if ($caller =~ /freebayes|bcftools/){ chkinstall($caller);}
if ($caller =~ /bcftools/){ chkinstall($caller);}
elsif ($caller eq 'varscan2'){ unless (-f $varjar){ die "Cannot find varscan jar file: $varjar\n";} }
else {die "\nVariant caller option $caller is unrecognized. Please use varscan2, bcftools, freebayes...\n\n";}
else {die "\nVariant caller option $caller is unrecognized. Please use varscan2 or bcftools...\n\n";}
}

## Option checks
Expand Down Expand Up @@ -464,66 +461,48 @@ sub variant{
else{
print "Calling variants with $caller on $fasta...\n\n";
if ($caller eq 'varscan2'){
if (($type eq 'snp')||($type eq 'indel')){
system "samtools \\
mpileup \\
-f $fasta \\
$bam_file \\
| \\
java -jar $varjar \\
mpileup2$type \\
--min-coverage $mc \\
--min-reads2 $mr \\
--min-avg-qual $maq \\
--min-var-freq $mvf \\
--min-freq-for-hom $mhom \\
--p-value $pv \\
--strand-filter $sf \\
--output-vcf \\
> $vcf_file";
}
my $vflag = '';
my $cns;
if (($type eq 'snp')||($type eq 'indel')){ $cns = $type; }
elsif ($type eq 'both'){
system "samtools \\
mpileup \\
-f $fasta \\
$bam_file \\
| \\
java -jar $varjar \\
mpileup2cns \\
--min-coverage $mc \\
--min-reads2 $mr \\
--min-avg-qual $maq \\
--min-var-freq $mvf \\
--min-freq-for-hom $mhom \\
--p-value $pv \\
--strand-filter $sf \\
--variants \\
--output-vcf \\
> $vcf_file";
$cns = 'cns';
$vflag = '--variants';
}
else {die "\nERROR: Unrecognized variant type. Please use: snp, indel, or both\n\n";}
system "samtools \\
mpileup \\
-f $fasta \\
$bam_file \\
| \\
java -jar $varjar \\
mpileup2$cns \\
--min-coverage $mc \\
--min-reads2 $mr \\
--min-avg-qual $maq \\
--min-var-freq $mvf \\
--min-freq-for-hom $mhom \\
--p-value $pv \\
--strand-filter $sf \\
$vflag \\
--output-vcf \\
> $vcf_file";
}
elsif ($caller eq 'bcftools'){ ## tested with 1.10.2
unless ($type =~ /snp|indel|both/){ die "\nERROR: Unrecognized variant type. Please use: snp, indel, or both\n\n"; }
my $varV = '';
if ($type eq 'snp'){ $varV = '-V indels'; }
elsif ($type eq 'indel') { $varV = '-V snps'; }
system "bcftools \\
mpileup \\
-f $fasta \\
$bam_file \\
| \\
bcftools \\
call \\
-vmO v \\
$varV \\
--ploidy $ploidy \\
--output $vcf_file";
}
elsif ($caller eq 'freebayes'){ ## single thread only, parallel version behaving wonky
system "samtools index $bam_file";
system "freebayes -f $fasta -p $ploidy $bam_file > $vcf_file";
unless ($index eq 'csi') { system "rm $bam_file.bai"; }
mpileup \\
-f $fasta \\
$bam_file \\
| \\
bcftools \\
call \\
-vmO v \\
$varV \\
--ploidy $ploidy \\
--output $vcf_file";
}
}
}
Expand Down Expand Up @@ -666,7 +645,7 @@ sub print_options{
print MAP " P-value threshold for calling variants: $pv\n";
print MAP " Strand filter: $sf\t\#\# 1 ignores variants with >90% support on one strand\n";
}
if ($caller eq "bcftools|freebayes"){print MAP "Setting ploidy to: $ploidy\n";}
if ($caller eq "bcftools"){print MAP "Setting ploidy to: $ploidy\n";}
if ($type eq 'snp'){print MAP "Searching for SNPs...\n";}
elsif ($type eq 'indel') {print MAP "Searching for indels...\n";}
elsif ($type eq 'both') {print MAP "Searching for SNPs and indels...\n";}
Expand Down

0 comments on commit 67d4d95

Please sign in to comment.