This folder archives the scripts for EBI data download, random selection of artic model, artic and artex running, and post analysis.
The conda environment used is listed in ./envs
. Use conda env create
to recreate each environment,
- Minimap2
conda env create -f ./envs/minimap2.yaml;
- bcftools
conda env create -f ./envs/bcftools.yaml;
- Artic
conda env create -f ./envs/artic.yaml;
- Artex
conda env create -f ./envs/artex.yaml;
- hap.py
conda env create -f ./envs/hap.yaml;
- bioawk
conda env create -f ./envs/bioawk.yaml;
- Preprocess of COG-UK EBI metadata Purpose: COG-UK sample id with both NGS and ONT sequencing, and retrieve the download links for raw ONT sequence and analysis files (NGS-assembled consensus sequence, ONT-assembled consensus sequence)
bash ./ebi_preprocess_scripts/preprocess.sh;
- Download data Download raw ONT sequencing data, NGS-assembled consensus sequence, and ONT-assembled consensus sequence, and COVID19 reference files, Artic primer scheme V3 files.
bash ./download_scripts/download_all.sh;
- LongBow config prediction Run LongBow on all raw ONT sequencing data and retrieve the predicted basecalling configuration
bash ./longbow_pred_script/run_longbow.sh;
- Run Artic pipeline
Run Artic pipeline with three different Medaka model setting: 1. The LongBow predicted Medaka model, 2. A random Medaka model generated by Python
random
package, 3. The default Medaka model.
bash ./artic_scripts/run_artic.sh;
- Run Artex pipeline
Run Artex (Artic extension) pipeline with extra
Clair3
re-variant calling
bash ./artex_scripts/run_artex.sh;
- Post analysis Evaluate F1-score for each scheme variant calling, find extra variant for Artex pipeline compared to the traditional Artic pipeline.
bash ./post_analysis/post_analysis.sh;
./download_list.txt
column | Content |
---|---|
1st column | ERR id |
2nd column | ONT raw reads FASTQ download link |
3rd column | COG-UK ONT consensss file download link |
4th column | COG-UK NGS consensus file download link |
-
../results/longbow.log
Include theLongBow
prediction results of each 269 FASTQ files, in our case, is all R9 Guppy 3/4 HAC. -
./ERR_list_models.txt
Include the ERR id, random seeds, and randomMedaka
mode for each ERR data. -
../results/ERR*
In eachERR*
files contains 4 subdirectory: longbow: LongBow predicted Medaka model, default: default Medaka model, random: random Medaka model listed in./ERR_list_models.txt
, Artex: results of Artex pipelines -
../results/F1_score_file
Include the F1 score details of each separate mode of SNP and INDEL.
To repeat our results, please install the previously mentioned conda environment and run
bash ./run_all.sh;
- Artic pipeline
If you encounter
medaka: error: argument command: invalid choice: 'consensus'
, because your conda install newermedaka
version but not the compatabile one, try running the following command:
conda install -c bioconda -c conda-forge artic medaka=1.11.3