This directory archives the testing pipeline to evaluate the LongBow
performance. The test data contains two separate categories:
-
Our own basecalled data using raw FAST5/POD5 data;
-
Top
10,000
sequences of human ONT sequencing data on SRA database until2024-Jan-9
.
Use the LongBow
conda environment
conda env create -f longbow.yaml;
Please download the 66 groups of test data we shared through ScienceDB link to the ../data
directory and decompress it before running the following pipeline.
The data is shared through https://www.scidb.cn/en/detail?dataSetId=47fc05aee6be46719aeb7cf03cfc70bf
You can follow the instruction in here to download the shared data.
mkdir -p ../data;
cd ../data;
# Download the FASTQ file through ScienceDB
## decompression
tar -zxvf sixty_six_samples.tar.gz;
Please follow the instruction in longbow2.2.0/README.md
To be notice, the longbow code is not provided in this repository.
bash run_own_called_longbow.sh;
bash analysis_own_called_result.py;
bash download_all_sra_ont.sh;
bash run_sra_ont_longbow.sh;
bash analysis_sra_ont.sh;
- LongBow test results on our own basecalled data is
../results/own_called.csv
; - LongBow test results on ONT SRA data is in
../results/sra_ont_results.csv
.
To repeat our results, please install the aforementioned conda environment and download the shared data, then run
bash ./run_all.sh;