Skip to content

Latest commit

 

History

History
121 lines (107 loc) · 4.77 KB

README.md

File metadata and controls

121 lines (107 loc) · 4.77 KB

mugio

Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.

Mugio is intended to aid in the identification of potential inverted repeats in seqeuncing data from Oxford-Nanopore Technologies Nanopore seqeuncing platform (MinION, PromethION, GridION, etc.).

Important note for NYU HPC users: You can load all necessary modules via:

source demo/module_load.sh

Citation

Spealman P, Burrell J, Gresham D. Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy. Nucleic Acids Res. 2020 May 21;48(9):4940-4945. doi: 10.1093/nar/gkaa206. PMID: 32255181; PMCID: PMC7229812.

Requirements

Materials

Mugio reguires the user supplied data:

  • Fastq generated by Albacore or Guppy
  • Aligned bam file generated by Minimap2

Mugio reguires the following programs:

  • gzip 1.5+
  • python 2.7+ or 3.6+
  • samtools 1.6+
  • bedtools 2.26.0+
  • (optional) The --blast command requires blast+ 2.9.0+

Mugio reguires the following python packages:

  • numpy
  • scipy.stats
  • matplotlib
  • pandas

Installation

Mugio is a stand alone python script as such it can be run locally by merely downloading the script. Installation through git clone is the preferred method. To download

git clone https://github.com/pspealman/mugio.git

To test installation

cd mugio
python mugio.py -demo

Quick Start pipeline for inverted repeat junction identification and evaluation

Command --bprd (breakpoint retrieval and definition)

  • Purpose: Identifies loci likely to be inverted repeat junctions associated with inverted duplications.
  • Format:
python mugio.py -bprd -f <fastq_file> -s <sam_file> -bam <bam_file> -o <output_path_and/or_file_prefix>
  • Demo:
python mugio.py -bprd -f demo/demo.fastq -s demo/demo.sam -bam demo/demo.bam -o demo_output/demo_bprd
  • Results: Identified likely candidates are recorded in the out_put path as a bed file with the suffix '_bprd'. Therefore if '-o demo_output/demo_bprd' the results will be stored in 'demo_output/demo_bprd_bprd.bed'

Command --evaluate

  • Purpose: Calculates the correlation (Spearman's rho) between pre-breakpoint seqeunce length and post-breakpoint low scoring region length.
  • Format:
python mugio.py --evaluate [-bpf bprd.bed | -snf sniffles.vcf] -s <sam_file> -o <output_path_and/or_file_prefix>
  • Demo:
python mugio.py --evaluate -bpf demo_output/demo_bprd_bprd.bed -f demo/demo.fastq -s demo/demo.sam -o demo/demo_bprd_lengths
  • Results: Identified candidates that have closed low-phred regions with trace figures generated with low scoring regions identified. These will be saved in the out_path path as sub folders named after the inverted repeat junctions coordinates.

Optional modes

View phred score trace of a read (--plot_phred or -pp)

  • Purpose: Generate a figure showing the Phred score of each nucleotide (blue) as well as the median phread score calculated over a 1K window (orange).
  • Format (for a single read):
python mugio.py -pp -f <fastq_file> -uid <read uid> -o <output_path_and/or_file_name>
  • Demo:
python mugio.py -pp -f demo/demo.fastq -uid c1d4f6dc-cc1f-4431-a6a0-1b9f5109342c -o c1d4f6dc.png
  • Format (for all reads in a fastq):
python mugio.py -pp -f <fastq_file> -o <output_path_and/or_file_name>
  • Demo:
python mugio.py -pp -f demo/demo.fastq -o demo_
  • Results: Figure showing and overlay of the immediate phred score and a trace of the median phred score across a rolling window. Note: the because the window is 1000 nt (by defualt) the median trace will stop 100 nt before the end of the phred score trace.

Calculate chromosome and genome coverage (--coverage_calculator or -cc)

  • Purpose: Generate a table with read depth metircs such as median depth, standard deviation, genome wide relative median depth, etc.
  • Format:
python mugio.py -cc [-filter <chromo_name>] -f <fastq_file> -s <sam_file> -o <output_path_and/or_file_prefix>
  • Demo:
python mugio.py -cc -filter NC_001224.1 -f ont_DGY1657/BC01_1657.fastq -s ont_DGY1657/BC01_1657.sam -o ont_DGY1657/cc_BC01_1657
  • Results: Tab-delimited file containing rows for each scaffold, chromosome, as well as total (assumbed to be the genome). Columns are various metrics of the aligned read-depth.

Generate a discordant read bam file (--get_discordant or -get)

  • Purpse: Ge
  • Format:
python mugio.py --get -f <sample_fastq> -s <sample_aligned.sam> -pct <float> -o <output_prefix>
  • Demo:
python mugio.py --get_discordant -f fastq/DGY1726.fastq -s bam/DNA_DGY1726.sam -pct 0 -o mugio/DGY1726_1
  • Results: Any read that is over some percentage (-pct) of non-mapping nuceltotides will get extracted and resolved into a bam alignment file.