Repeat and reanalysis COGUK data

Description

This folder archives the scripts for EBI data download, random selection of artic model, artic and artex running, and post analysis.

Execution environments

The conda environment used is listed in ./envs. Use conda env create to recreate each environment,

Minimap2

conda env create -f ./envs/minimap2.yaml;

bcftools

conda env create -f ./envs/bcftools.yaml;

Artic

conda env create -f ./envs/artic.yaml;

Artex

conda env create -f ./envs/artex.yaml;

hap.py

conda env create -f ./envs/hap.yaml;

bioawk

conda env create -f ./envs/bioawk.yaml;

Pipelines

Preprocess of COG-UK EBI metadata Purpose: COG-UK sample id with both NGS and ONT sequencing, and retrieve the download links for raw ONT sequence and analysis files (NGS-assembled consensus sequence, ONT-assembled consensus sequence)

bash ./ebi_preprocess_scripts/preprocess.sh;

Download data Download raw ONT sequencing data, NGS-assembled consensus sequence, and ONT-assembled consensus sequence, and COVID19 reference files, Artic primer scheme V3 files.

bash ./download_scripts/download_all.sh;

LongBow config prediction Run LongBow on all raw ONT sequencing data and retrieve the predicted basecalling configuration

bash ./longbow_pred_script/run_longbow.sh;

Run Artic pipeline Run Artic pipeline with three different Medaka model setting: 1. The LongBow predicted Medaka model, 2. A random Medaka model generated by Python random package, 3. The default Medaka model.

bash ./artic_scripts/run_artic.sh;

Run Artex pipeline Run Artex (Artic extension) pipeline with extra Clair3 re-variant calling

bash ./artex_scripts/run_artex.sh;

Post analysis Evaluate F1-score for each scheme variant calling, find extra variant for Artex pipeline compared to the traditional Artic pipeline.

bash ./post_analysis/post_analysis.sh;

Results description

./download_list.txt

column	Content
1st column	ERR id
2nd column	ONT raw reads FASTQ download link
3rd column	COG-UK ONT consensss file download link
4th column	COG-UK NGS consensus file download link

../results/longbow.log Include the LongBow prediction results of each 269 FASTQ files, in our case, is all R9 Guppy 3/4 HAC.
./ERR_list_models.txt Include the ERR id, random seeds, and random Medaka mode for each ERR data.
../results/ERR* In each ERR* files contains 4 subdirectory: longbow: LongBow predicted Medaka model, default: default Medaka model, random: random Medaka model listed in ./ERR_list_models.txt, Artex: results of Artex pipelines
../results/F1_score_file Include the F1 score details of each separate mode of SNP and INDEL.

Repeat our results

To repeat our results, please install the previously mentioned conda environment and run

bash ./run_all.sh;

Possible error

Artic pipeline If you encounter medaka: error: argument command: invalid choice: 'consensus', because your conda install newer medaka version but not the compatabile one, try running the following command:

conda install -c bioconda -c conda-forge artic medaka=1.11.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repeat and reanalysis COGUK data

Description

Execution environments

Pipelines

Results description

Repeat our results

Possible error

Files

README.md

Latest commit

History

README.md

File metadata and controls

Repeat and reanalysis COGUK data

Description

Execution environments

Pipelines

Results description

Repeat our results

Possible error