This repository archives the pipelines and source codes used in the LongBow manuscript.
The contents are organized into six main folders. Please feel free to click on any title to view the detailed README.md
.
-
Assessment of the availability and metadata labeling of Oxford Nanopore sequencing data in the SRA database
-
Benchmark of
Clair3
,Shasta
,Medaka
with correct or wrong configs/models -
Instructions for training the LongBow model using Nanopore raw data from various model organisms. Details on the training data and how to perform a leave-one-out test to determine the best lag for autocorrelation analysis.
-
Instruction for testing LongBow on 66 independent groups of ONT data and human ONT SRA data.
-
Repart and reanalysis of COGUK SARS-CoV-2 data
-
Others
Other pipelines in the manuscript
Codes were tested on Linux operating systems. The following release is tested: Linux: Redhat Enterprise Linux 8 Linux: Ubuntu 22.04.1
Most of the following softwares are installed through Conda
environment. We have run test on Conda version 24.1.2
and version 24.4.0
.
We strongly recommend installing Conda version >= 24.1.x
.
You can follow the Conda manual in here to install Conda
.
We provide the conda .yaml
for each runnning enironment. The installation of each conda environment may take serveral minutes, depending on your system and network.
To run the Python scripts we provided, Python 3.7 or a higher version is required.
Software | Version |
---|---|
Artic | 1.2.4 |
bcftools | 1.19 |
Bioawk | 20110810 |
Chopper | 0.7.0 |
Clair3 | 1.0.4, 1.0.10 |
Flye | 2.9.3-b1797 |
hdf5 | 1.12.1 |
Medaka | 1.11.3 |
Minimap2 | 2.26-r1175, 2.28-r1209 |
ont-fast5-api | 4.1.1 |
pod5 | 0.2.4 |
seqtk | 1.3-r106 |
Shasta | 0.11.1 |
yak | 0.1-r56 |
Python | 3.7.3 |
MATLAB | R2023a |