Genome Graphs Benchmark Suite - GGBS

   █████████    █████████  ███████████   █████████ 
  ███░░░░░███  ███░░░░░███░░███░░░░░███ ███░░░░░███
 ███     ░░░  ███     ░░░  ░███    ░███░███    ░░░ 
░███         ░███          ░██████████ ░░█████████ 
░███    █████░███    █████ ░███░░░░░███ ░░░░░░░░███
░░███  ░░███ ░░███  ░░███  ░███    ░███ ███    ░███
 ░░█████████  ░░█████████  ███████████ ░░█████████ 
  ░░░░░░░░░    ░░░░░░░░░  ░░░░░░░░░░░   ░░░░░░░░░

Genome Graphs Benchmark Suite - GGBS

GGBS is the first implemented Benchmark Suite for sequence-to-graph alignment in the genomic analysis context. It includes multiple state-of-the-art tools and six different genome graphs built with VG Toolkit¹. Tools included are:

Astarix ²
GraphAligner ³
GWFA ⁴
SGA ⁵
V-ALign ⁶
VG Toolkit Giraffe ⁷
GEDMAP ⁸

Select which tools and version you want to use in tools_config.yml.

Installation

Install dependencies

Docker >= 24.0

Download and install

First, get the repo:

git clone https://github.com/Mirkocoggi/GGBS.git

Then just run:

cd GGBS

Usage

Create the dockerfiles

The first step is to create a Dockerfiles folder containing a subfolder for each tool. In each subfolder, there will be the respective Dockerfile. Run:

python make_dockerfiles.py

Data Selection

The folder input_data contains genome graphs and sequence reads to be aligned. Each alignment experiment has its folder, comprising two subfolders:

GRAPH: contains the input graph in GFA format;
READS: contains the sequence reads in FASTA/Q format.

The experiments' folders are grouped into two higher-level folders:

TEST: contains the alignments to be executed with the previously selected tools;
IGNORE: contains alignment experiments that should not be included in the evaluation.

Create docker compose file

Create the docker-compose.yml that builds all the docker images and executes all the experiments. Run:

python make_dockercompose.py

Run the experiments

Run the experiments with the command:

docker compose up

Results Collection

Results are uploaded in the results folder, where a subdirectory named with a timestamp is created for each experiment. To collect all the execution times, you can generate a summary_timing.csv file by running:

python utils/timing.py results/<timestamp>

The summary file is generated in results/<timestamp>/summary_timing.csv.

E. Garrison, J. Sir ́en, A. M. Novak, G. Hickey, J. M. Eizenga, E. T. Dawson, W. Jones, S. Garg, C. Markello, M. F. Lin et al., “Variation graph toolkit improves read mapping by representing genetic variation in the reference,” Nature biotechnology, vol. 36, no. 9, pp. 875–879, 2018 ↩
P. Ivanov, B. Bichsel, and M. Vechev, “Fast and optimal sequence-to-graph alignment guided by seeds,” in International Conference on Research in Computational Molecular Biology. Springer, 2022, pp. 306–325 ↩
M. Rautiainen, V. M ̈akinen, and T. Marschall, “Bit-parallel sequence-to-graph alignment,” Bioinformatics, vol. 35, no. 19, 2019. ↩
H. Zhang, S. Wu, S. Aluru, and H. Li, “Fast sequence to graph alignment using the graph wavefront algorithm,” arXiv preprint arXiv:2206.13574, 2022. ↩
C. Jain, H. Zhang, Y. Gao, and S. Aluru, “On the complexity of sequence-to-graph alignment,” Journal of Computational Biology, vol. 27, no. 4, pp. 640–654, 2020. ↩
V. N. S. Kavya, K. Tayal, R. Srinivasan, and N. Sivadasan, “Sequence alignment on directed graphs,” Journal of Computational Biology, vol. 26, no. 1, pp. 53–67, 2019. ↩
J. Sir ́en, J. Monlong, X. Chang, A. M. Novak, J. M. Eizenga, C. Markello, J. A. Sibbesen, G. Hickey, P.-C. Chang, A. Carroll et al., “Genotyping common, large structural variations in 5,202 genomes using pangenomes, the giraffe mapper, and the vg toolkit,” BioRxiv, pp. 2020–12, 2020 ↩
T. B¨uchler, J. Olbrich, and E. Ohlebusch, “Efficient short read mapping to a pangenome that is represented by a graph of ed strings,” Bioinformatics, vol. 39, no. 5, p. btad320, 2023. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
input_data		input_data
results		results
utils		utils
zip_reads_folder		zip_reads_folder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparison_results_ga.csv		comparison_results_ga.csv
comparison_results_geds.csv		comparison_results_geds.csv
comparison_results_vg.csv		comparison_results_vg.csv
make_dockercompose.py		make_dockercompose.py
make_dockerfiles.py		make_dockerfiles.py
tools_config.yml		tools_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome Graphs Benchmark Suite - GGBS

Installation

Install dependencies

Download and install

Usage

Create the dockerfiles

Data Selection

Create docker compose file

Run the experiments

Results Collection

About

Releases

Packages

Contributors 2

Languages

License

Mirkocoggi/GGBS

Folders and files

Latest commit

History

Repository files navigation

Genome Graphs Benchmark Suite - GGBS

Installation

Install dependencies

Download and install

Usage

Create the dockerfiles

Data Selection

Create docker compose file

Run the experiments

Results Collection

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages