Skip to content

Major release: new reassortment inference method and handling of datasets with non-uniform strain labels

Latest
Compare
Choose a tag to compare
@markin-alex markin-alex released this 22 Oct 23:04
· 8 commits to main since this release
  • New reassortment inference method: mincut. The new method is more robust than the current 'local' method in many instances and does not result in "uncertain" reassortment inferences with the '?' annotation. The method uses the reassortment test to cut the reference phylogeny into the optimum (smallest) number of non-reassorting parts with theoretical guarantees on optimality. To use the new method, specify -m mincut.
  • Handling of datasets with non-uniform strain headers across trees and alignments. Previously, TreeSort required perfect matches between strain labels across the alignments and trees. Now, the alignments can have different sizes - TreeSort will automatically find the common set of strains. Further, if you have differing strain labels for different segments - you can use --match-on-strain or --match-on-epi commands to let TreeSort automatically match different segments either based on the strain name (e.g., "A/swine/Iowa/2384/2015") or the EPI_ISL ID for GISAID data, and keep the other parts of the deflines non-uniform.
  • Automatically parse and prepare your dataset for TreeSort analysis. The prepare_dataset.sh script can take standard flu fasta files with a mix of different segments and downloaded from e.g., GISAID, GenBank, or other sources, and automatically split it, build trees for different segments, and make a descriptor file for TreeSort. The script requires standard "|" (pipe) delimeters between different fields in fasta headers, and having the segment identifiers as one of the fields in the middle. E.g., "EPI_ISL_xxx|A/Texas/33/2017|NA|2017-02-18". Note that the dates are also required to run this script and TreeSort in general.
    Note: The script only builds "rough" FastTree trees for each of the segments. To get best results from TreeSort, its best to have the reference tree inferred by another maximum likelihood method, e.g., RAxML or IQ-Tree. The rooting of the tree can be performed by the treetime-root.py script provided in this repo.
  • TreeSort can now be installed via pip. Just run pip install treesort.