Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.
Dryad analyzes fasta files that have been processed either by Spriggan or by PHoeNIx. Dryad is split into two major workflows:
- A workflow dedicated to fine scale outbreak investigations that are within a singular outbreak. This process uses a reference to determine relatedness and snp distances. The reference can be removed from the alignment based workflow to create a phylogenetic tree that gives a high resolution look at a singular outbreak.
- A workflow dedicated to identifying historical relatedness across multiple years and multiple outbreaks without the use of a reference. This alignment free workflow gives a low resolution look at historical relatedness.
Usage
Input
Parameters
Workflow
Output
Credits
Contributions-and-Support
Citations
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data. To use Dryad, a Nextflow minimum version of 24.04.2.5914 is needed.
To run an alignment free comparison, use:
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--alignment_free
Alternatively, to run an alignment based comparison, use:
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--fasta <REFERENCE_FASTA> \
--alignment_based
To run both and alignment based and an alignment free comparison, use:
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--fasta <REFERENCE_FASTA> \
--alignment_based \
--alignment_free
- Nextflow caches previously run pipelines. This can result in an older version of a pipeline being utilized. To get the most up-to-date version of a pipeline like Dryad, use the
-latest
tag.
Prepare a samplesheet with your input data with each row representing one fasta file. The samplesheet will look as follows:
samplesheet.csv
:
sample | fasta |
---|---|
sample_1 | 2024_1.contigs.fa |
sample_2 | 2024_2.contigs.fa |
Dryad's main parameters and their defaults are shown in the table below:
Parameter | Parameter description and defaults | Example usage |
---|---|---|
input | Path to comma-separated file containing information about the samples in the experiment | --input <PATH_TO_SAMPLESHEET> |
outdir | Output directory where the results will be saved. Absolute path must be used for storage on cloud infrastructure | --outdir <DESIRED_OUTPUT_PATH> |
profile | Denotes how to access containerized software. | -profile aws |
fasta | Reference fasta used for alignment based comparisons. Default is no reference fasta. | --fasta <PATH_TO_REF_FASTA> |
alignment_based | Performs a fine scale analysis within a singular outbreak | --alignment_based |
alignment_free | Performs a historical analysis across multiple years and outbreaks | --alignment_free |
task.cpus | Denotes how many cpus to use for Mashtree. Default task.cpus is 2. | --task.cpus 4 |
cg_tree_model | Tells IQ-TREE what model to use. Default cg_tree_model is GTR+G | --cg_tree_model "GTR+G" |
parsnp_partition | Tells parsnp the minimum partition amount or to not partition. Default is --no-partition.* | --parsnp_partition "--min-partition-size 50" |
skip_quast | If the data was run through pheonix or another pipeline with a quality check, skips QUAST and the summary options. Default is to run QUAST as if quality summaries were not previously run. | --skip_quast |
add_reference | Used to add the reference into tree building for IQ-TREE. Default is to remove the reference in tree building. | --add_reference |
*If you are running an alignment based workflow on >100 samples, it may be beneficial to take into account a higher partitioning value than the default of 100. More information can be found in parsnp 2.0's paper.
- Enter assembled FASTA genomes into a samplesheet.
- QUAST v5.2.0 is used to determine assembly quality if skip_quast is not indicated.
- QUAST results are summarized with a custom python script to increase readability.
- Historical Comparison
- Mashtree v1.4.6 generates a phylogenetic tree using Mash distances.
- Fine scale Comparison
- Bootstrapping in IQ-TREE2 requires at least 4 genomes. If less than 4 genomes are used, IQ-TREE2 will not perform bootstrapping.
- Parsnp v2.0.5 is used to perform a core genome alignment.
- IQ-TREE2 v2.3.4 is used for inferring a phylogenetic tree.
- Snp-dists v0.8.2 is used to calculate the SNP distance matrix.
An example of Dryad's output directory structure for both alignment based and alignment free steps can be seen below. These directories will not include QUAST if --phoenix
is used:
alignment_based_output/
├── iqtree
│ └── parsnp.snps.mblocks.treefile
├── parsnp
│ └── parsnp_output
│ ├── parsnp.ggr
│ ├── parsnp.snps.mblocks
│ ├── parsnp.tree
│ └── parsnp.xmfa
├── pipeline_info
│ ├── *.html
│ ├── *.txt
│ └── samplesheet.valid.csv
├── quast
│ ├── *.quast.report.tsv
│ ├── *.transposed.quast.report.tsv
│ └── quast_results.tsv
└── snpdists
└── snp_dists_matrix.tsv
alignment_free_output/
├── mashtree
│ └── mashtree.bootstrap.dnd
├── pipeline_info
│ ├── *.html
│ ├── *.txt
│ └── samplesheet.valid.csv
└── quast
├── *.quast.report.tsv
├── *.transposed.quast.report.tsv
└── quast_results.tsv
Notable output files:
Alignment based
File | Output |
---|---|
quast_results.tsv* | Assembly quality results |
snp_dists_matrix.tsv | Number of SNP distances between each pair of isolates |
parsnp.snps.mblocks.treefile | Maximum likelihood phylogenetic tree |
*QUAST results will not be present if --skip_quast
was used.
Alignment free
File | Output |
---|---|
quast_results.tsv* | Assembly quality results |
mashtree.bootstrap.dnd | Neighbor joining tree based on mash distances |
*QUAST results will not be present if --skip_quast
was utilized.
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters;
see docs.
Dryad was written by Dr. Kelsey Florek, Dr. Abigail C. Shockey, and Eva Gunawan.
We thank the bioinformatics group at the Wisconsin State Laboratory of Hygiene for all of their contributions.
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use Dryad for your analysis, please cite it using the following:
K. Florek, A.C. Shockey, & E. Gunawan (2014). Dryad (Version 4.0.0) [https://github.com/wslh-bio/dryad].
An extensive list of references for the tools used by Dryad can be found in the CITATIONS.md
file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.