NanoVar is a genomic structural variant (SV) caller that utilizes low-depth long-read sequencing such as Oxford Nanopore Technologies (ONT). It characterizes SVs with using only 4x depth sequencing for homozygous SVs and 8x depth for heterozygous SVs.
- Performs long-read mapping (Minimap2) and SV discovery in a single pipeline.
- Accurately characterizes SVs using long sequencing reads (High SV recall and precision in simulation datasets, overall F1 score >0.9)
- Characterizes six classes of SVs including novel-sequence insertion, deletion, inversion, tandem duplication, sequence transposition (TPO) and translocation (TRA).
- Requires 4x and 8x sequencing depth for detecting homozygous and heterozygous SVs respectively.
- Rapid computational speed (Takes <3 hours to map and analyze 12 gigabases datasets (4x) using 24 CPU threads)
- Approximates SV genotype
- Identifies full-length LINE and SINE insertions (Marked by "TE=" in the INFO column of VCF file)
nanovar [Options] -t 24 -f hg38 sample.fq/sample.bam ref.fa working_dir
Parameter | Argument | Comment |
---|---|---|
-t |
num_threads | Indicate number of CPU threads to use |
-f (Optional) |
gap_file (Optional) | Choose built-in gap BED file or specify own file to exclude gap regions in the reference genome. Built-in gap files include: hg19, hg38 and mm10 |
- | sample.fq/sample.bam | Input long-read FASTA/FASTQ file or mapped BAM file |
- | ref.fa | Input reference genome in FASTA format |
- | working_dir | Specify working directory |
See wiki for entire list of options.
Output file | Comment |
---|---|
${sample}.nanovar.pass.vcf | Final VCF filtered output file (1-based) |
${sample}.nanovar.pass.report.html | HTML report showing run summary and statistics |
For more information, see wiki.
- Linux (x86_64 architecture, tested in Ubuntu 14.04, 16.04, 18.04)
There are three ways to install NanoVar:
# Installing from bioconda automatically installs all dependencies
conda install -c bioconda nanovar
# Installing from PyPI requires own installation of dependencies, see below
pip install nanovar
# Installing from GitHub requires own installation of dependencies, see below
git clone https://github.com/cytham/nanovar.git
cd nanovar
pip install .
- bedtools >=2.26.0
- samtools >=1.3.0
- minimap2 >=2.17
Please make sure each executable binary is in PATH.
Please visit here for instructions to install.
Please visit here for instructions to install.
Please visit here for instructions to install.
See wiki for more information.
See CHANGELOG
If you use NanoVar, please cite:
Tham, CY., Tirado-Magallanes, R., Goh, Y. et al. NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing. Genome Biol. 21, 56 (2020). https://doi.org/10.1186/s13059-020-01968-7
- Tham Cheng Yong - cytham
- Roberto Tirado Magallanes - rtmag
- Touati Benoukraf - benoukraflab
This project is licensed under GNU General Public License - see LICENSE.txt for details.
SV simulation datasets used in the manuscript can be downloaded here. Scripts used for simulation dataset generation and tool performance comparison are available here.
Although NanoVar is provided with a universal model and threshold score, instructions required for building a custom neural-network model is available here.
-
The inaccurate basecalling of large homopolymer or low complexity DNA regions may result in the false determination of deletion SVs. We advise the use of up-to-date ONT basecallers such as Dorado to minimize this possibility.
-
For BND SVs, NanoVar is unable to calculate the actual number of SV-opposing reads (normal reads) at the novel adjacency as there are two breakends from distant locations. It is not clear whether the novel adjacency is derived from both or either breakends in cases of balanced and unbalanced variants, and therefore it is not possible to know which breakend location(s) to consider for counting normal reads. Currently, NanoVar approximates the normal read count by the minimum count from either breakend location. Although this helps in capturing unbalanced BNDs, it might lead to some false positives.