GitHub - wslh-bio/dryad: Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.

Dryad

Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.

Dryad analyzes fasta files that have been processed either by Spriggan or by PHoeNIx. Dryad is split into two major workflows:

A workflow dedicated to fine scale outbreak investigations that are within a singular outbreak. This process uses a reference to determine relatedness and snp distances. The reference can be removed from the alignment based workflow to create a phylogenetic tree that gives a high resolution look at a singular outbreak.
A workflow dedicated to identifying historical relatedness across multiple years and multiple outbreaks without the use of a reference. This alignment free workflow gives a low resolution look at historical relatedness.

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data. To use Dryad, a Nextflow minimum version of 24.04.2.5914 is needed.

To run an alignment free comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --alignment_free

Alternatively, to run an alignment based comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --fasta <REFERENCE_FASTA> \
   --alignment_based

To run both and alignment based and an alignment free comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --fasta <REFERENCE_FASTA> \
   --alignment_based \
   --alignment_free

Nextflow caches previously run pipelines. This can result in an older version of a pipeline being utilized. To get the most up-to-date version of a pipeline like Dryad, use the -latest tag.

Input

Prepare a samplesheet with your input data with each row representing one fasta file. The samplesheet will look as follows:

samplesheet.csv:

sample	fasta
sample_1	2024_1.contigs.fa
sample_2	2024_2.contigs.fa

Parameters

Dryad's main parameters and their defaults are shown in the table below:

Parameter	Parameter description and defaults	Example usage
input	Path to comma-separated file containing information about the samples in the experiment	--input <PATH_TO_SAMPLESHEET>
outdir	Output directory where the results will be saved. Absolute path must be used for storage on cloud infrastructure	--outdir <DESIRED_OUTPUT_PATH>
profile	Denotes how to access containerized software.	-profile aws
fasta	Reference fasta used for alignment based comparisons. Default is no reference fasta.	--fasta <PATH_TO_REF_FASTA>
alignment_based	Performs a fine scale analysis within a singular outbreak	--alignment_based
alignment_free	Performs a historical analysis across multiple years and outbreaks	--alignment_free
task.cpus	Denotes how many cpus to use for Mashtree. Default task.cpus is 2.	--task.cpus 4
cg_tree_model	Tells IQ-TREE what model to use. Default cg_tree_model is GTR+G	--cg_tree_model "GTR+G"
parsnp_partition	Tells parsnp the minimum partition amount or to not partition. Default is --no-partition.*	--parsnp_partition "--min-partition-size 50"
skip_quast	If the data was run through pheonix or another pipeline with a quality check, skips QUAST and the summary options. Default is to run QUAST as if quality summaries were not previously run.	--skip_quast
add_reference	Used to add the reference into tree building for IQ-TREE. Default is to remove the reference in tree building.	--add_reference

*If you are running an alignment based workflow on >100 samples, it may be beneficial to take into account a higher partitioning value than the default of 100. More information can be found in parsnp 2.0's paper.

Workflow

1. Universal Steps

Enter assembled FASTA genomes into a samplesheet.
QUAST v5.2.0 is used to determine assembly quality if skip_quast is not indicated.
QUAST results are summarized with a custom python script to increase readability.

2. Comparison Steps

Historical Comparison
- Mashtree v1.4.6 generates a phylogenetic tree using Mash distances.
Fine scale Comparison
- Bootstrapping in IQ-TREE2 requires at least 4 genomes. If less than 4 genomes are used, IQ-TREE2 will not perform bootstrapping.
- Parsnp v2.0.5 is used to perform a core genome alignment.
- IQ-TREE2 v2.3.4 is used for inferring a phylogenetic tree.
- Snp-dists v0.8.2 is used to calculate the SNP distance matrix.

Output

An example of Dryad's output directory structure for both alignment based and alignment free steps can be seen below. These directories will not include QUAST if --phoenix is used:

alignment_based_output/
├── iqtree
│   └── parsnp.snps.mblocks.treefile
├── parsnp
│   └── parsnp_output
│       ├── parsnp.ggr
│       ├── parsnp.snps.mblocks
│       ├── parsnp.tree
│       └── parsnp.xmfa
├── pipeline_info
│   ├── *.html
│   ├── *.txt
│   └── samplesheet.valid.csv
├── quast
│   ├── *.quast.report.tsv
│   ├── *.transposed.quast.report.tsv
│   └── quast_results.tsv
└── snpdists
    └── snp_dists_matrix.tsv

alignment_free_output/
├── mashtree
│   └── mashtree.bootstrap.dnd
├── pipeline_info
│   ├── *.html
│   ├── *.txt
│   └── samplesheet.valid.csv
└── quast
    ├── *.quast.report.tsv
    ├── *.transposed.quast.report.tsv
    └── quast_results.tsv

Notable output files:

Alignment based

File	Output
quast_results.tsv*	Assembly quality results
snp_dists_matrix.tsv	Number of SNP distances between each pair of isolates
parsnp.snps.mblocks.treefile	Maximum likelihood phylogenetic tree

*QUAST results will not be present if --skip_quast was used.

Alignment free

File	Output
quast_results.tsv*	Assembly quality results
mashtree.bootstrap.dnd	Neighbor joining tree based on mash distances

*QUAST results will not be present if --skip_quast was utilized.

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

Dryad was written by Dr. Kelsey Florek, Dr. Abigail C. Shockey, and Eva Gunawan.

We thank the bioinformatics group at the Wisconsin State Laboratory of Hygiene for all of their contributions.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use Dryad for your analysis, please cite it using the following:

K. Florek, A.C. Shockey, & E. Gunawan (2014). Dryad (Version 4.0.0) [https://github.com/wslh-bio/dryad].

An extensive list of references for the tools used by Dryad can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
assets		assets
bin		bin
conf		conf
docs		docs
lib		lib
modules/local		modules/local
samplesheets		samplesheets
subworkflows		subworkflows
test-dataset/short_dataset		test-dataset/short_dataset
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
pyproject.toml		pyproject.toml
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dryad

Table of Contents:

Usage

Input

Parameters

Workflow

1. Universal Steps

2. Comparison Steps

Output

Credits

Contributions and Support

Citations

About

Releases 8

Packages

Contributors 3

Languages

License

wslh-bio/dryad

Folders and files

Latest commit

History

Repository files navigation

Dryad

Table of Contents:

Usage

Input

Parameters

Workflow

1. Universal Steps

2. Comparison Steps

Output

Credits

Contributions and Support

Citations

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 3

Languages

Packages