Skip to content
/ dryad Public

Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.

License

Notifications You must be signed in to change notification settings

wslh-bio/dryad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dryad

dryad_logo

GPL-3.0 Github_Release

Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.

Dryad analyzes fasta files that have been processed either by Spriggan or by PHoeNIx. Dryad is split into two major workflows:

  1. A workflow dedicated to fine scale outbreak investigations that are within a singular outbreak. This process uses a reference to determine relatedness and snp distances. The reference can be removed from the alignment based workflow to create a phylogenetic tree that gives a high resolution look at a singular outbreak.
  2. A workflow dedicated to identifying historical relatedness across multiple years and multiple outbreaks without the use of a reference. This alignment free workflow gives a low resolution look at historical relatedness.

Table of Contents:

Usage
Input
Parameters
Workflow
Output
Credits
Contributions-and-Support
Citations

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data. To use Dryad, a Nextflow minimum version of 24.04.2.5914 is needed.

To run an alignment free comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --alignment_free

Alternatively, to run an alignment based comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --fasta <REFERENCE_FASTA> \
   --alignment_based 

To run both and alignment based and an alignment free comparison, use:

nextflow run wslh-bio/dryad \
   -latest \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --fasta <REFERENCE_FASTA> \
   --alignment_based \
   --alignment_free
  • Nextflow caches previously run pipelines. This can result in an older version of a pipeline being utilized. To get the most up-to-date version of a pipeline like Dryad, use the -latest tag.

Input

Prepare a samplesheet with your input data with each row representing one fasta file. The samplesheet will look as follows:

samplesheet.csv:

sample fasta
sample_1 2024_1.contigs.fa
sample_2 2024_2.contigs.fa

Parameters

Dryad's main parameters and their defaults are shown in the table below:

Parameter Parameter description and defaults Example usage
input Path to comma-separated file containing information about the samples in the experiment --input <PATH_TO_SAMPLESHEET>
outdir Output directory where the results will be saved. Absolute path must be used for storage on cloud infrastructure --outdir <DESIRED_OUTPUT_PATH>
profile Denotes how to access containerized software. -profile aws
fasta Reference fasta used for alignment based comparisons. Default is no reference fasta. --fasta <PATH_TO_REF_FASTA>
alignment_based Performs a fine scale analysis within a singular outbreak --alignment_based
alignment_free Performs a historical analysis across multiple years and outbreaks --alignment_free
task.cpus Denotes how many cpus to use for Mashtree. Default task.cpus is 2. --task.cpus 4
cg_tree_model Tells IQ-TREE what model to use. Default cg_tree_model is GTR+G --cg_tree_model "GTR+G"
parsnp_partition Tells parsnp the minimum partition amount or to not partition. Default is --no-partition.* --parsnp_partition "--min-partition-size 50"
skip_quast If the data was run through pheonix or another pipeline with a quality check, skips QUAST and the summary options. Default is to run QUAST as if quality summaries were not previously run. --skip_quast
add_reference Used to add the reference into tree building for IQ-TREE. Default is to remove the reference in tree building. --add_reference

*If you are running an alignment based workflow on >100 samples, it may be beneficial to take into account a higher partitioning value than the default of 100. More information can be found in parsnp 2.0's paper.

Workflow

dryad_workflow

1. Universal Steps

  • Enter assembled FASTA genomes into a samplesheet.
  • QUAST v5.2.0 is used to determine assembly quality if skip_quast is not indicated.
  • QUAST results are summarized with a custom python script to increase readability.

2. Comparison Steps

  • Historical Comparison
  • Fine scale Comparison
    • Bootstrapping in IQ-TREE2 requires at least 4 genomes. If less than 4 genomes are used, IQ-TREE2 will not perform bootstrapping.
    • Parsnp v2.0.5 is used to perform a core genome alignment.
    • IQ-TREE2 v2.3.4 is used for inferring a phylogenetic tree.
    • Snp-dists v0.8.2 is used to calculate the SNP distance matrix.

Output

An example of Dryad's output directory structure for both alignment based and alignment free steps can be seen below. These directories will not include QUAST if --phoenix is used:

alignment_based_output/
├── iqtree
│   └── parsnp.snps.mblocks.treefile
├── parsnp
│   └── parsnp_output
│       ├── parsnp.ggr
│       ├── parsnp.snps.mblocks
│       ├── parsnp.tree
│       └── parsnp.xmfa
├── pipeline_info
│   ├── *.html
│   ├── *.txt
│   └── samplesheet.valid.csv
├── quast
│   ├── *.quast.report.tsv
│   ├── *.transposed.quast.report.tsv
│   └── quast_results.tsv
└── snpdists
    └── snp_dists_matrix.tsv
alignment_free_output/
├── mashtree
│   └── mashtree.bootstrap.dnd
├── pipeline_info
│   ├── *.html
│   ├── *.txt
│   └── samplesheet.valid.csv
└── quast
    ├── *.quast.report.tsv
    ├── *.transposed.quast.report.tsv
    └── quast_results.tsv

Notable output files:

Alignment based

File Output
quast_results.tsv* Assembly quality results
snp_dists_matrix.tsv Number of SNP distances between each pair of isolates
parsnp.snps.mblocks.treefile Maximum likelihood phylogenetic tree

*QUAST results will not be present if --skip_quast was used.

Alignment free

File Output
quast_results.tsv* Assembly quality results
mashtree.bootstrap.dnd Neighbor joining tree based on mash distances

*QUAST results will not be present if --skip_quast was utilized.

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

Dryad was written by Dr. Kelsey Florek, Dr. Abigail C. Shockey, and Eva Gunawan.

We thank the bioinformatics group at the Wisconsin State Laboratory of Hygiene for all of their contributions.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use Dryad for your analysis, please cite it using the following:

K. Florek, A.C. Shockey, & E. Gunawan (2014). Dryad (Version 4.0.0) [https://github.com/wslh-bio/dryad].

An extensive list of references for the tools used by Dryad can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.

Resources

License

Stars

Watchers

Forks

Packages

No packages published