-
Notifications
You must be signed in to change notification settings - Fork 0
Maximum Likelihood Clade Assignment (MLCA)
AAV-Atlas employs the Maximum Likelihood Clade Assignment (MLCA) method to assign adeno-associated virus (AAV) sequences to specific species and serotypes. This method applies a robust genotyping framework based on the Evolutionary Placement Algorithm (EPA) within the RAxML software suite.
MLCA enables efficient and accurate placement of new AAV sequences onto a fixed reference phylogeny without recalculating the entire tree, making it well-suited for large-scale genomic analyses. This genotyping capability forms a core part of AAV-Atlas, facilitating species and serotype identification for submitted sequences.
In AAV-Atlas, the MLCA process is executed using the maxLikelihoodGenotyper
and maxLikelihoodPlacer
modules.
The genotyping process in AAV-Atlas can be initiated through the command-line interface. Below is an example of using the MLCA genotyping module for AAV sequences:
GLUE> module aavMaxLikelihoodGenotyper genotype file -f example/test-seqs.fasta
This command processes the sequences in the specified FASTA file and outputs the assigned species and serotype clades for each sequence:
+===========+===================+====================+
| queryName | speciesFinalClade | serotypeFinalClade |
+===========+===================+====================+
| AX344105 | AL_Primate1 | AL_AAV2 |
| AX496953 | AL_Primate1 | AL_AAV2 |
| AX703462 | AL_Primate1 | AL_AAV2 |
| AX720902 | AL_Primate1 | AL_AAV2 |
| AX925291 | AL_Primate1 | AL_AAV2 |
| AX925550 | AL_Primate1 | AL_AAV2 |
| BD293519 | AL_Primate1 | AL_AAV2 |
| HV955994 | AL_Primate1 | AL_AAV2 |
| HZ796968 | AL_Primate1 | AL_AAV2 |
| LQ396120 | AL_Primate1 | AL_AAV2 |
| MP863866 | AL_Primate1 | AL_AAV2 |
| OF065946 | AL_Primate1 | AL_AAV2 |
| PC321937 | AL_Primate1 | AL_AAV2 |
| PE178192 | AL_Primate1 | AL_AAV2 |
| PF056792 | AL_Primate1 | AL_AAV2 |
| V01457 | AL_Primate1 | AL_AAV2 |
+===========+===================+====================+
In this example, each AAV sequence is assigned to the Primate1
species clade and the AAV2
serotype clade.
MLCA operates through three primary stages: alignment, placement, and neighbor-weighting. Each stage is critical for accurately assigning query sequences to predefined clades.
-
Alignment Stage: Query sequences are aligned against a curated set of reference AAV sequences. The alignment uses the MAFFT software with the
--add
and--keeplength
options, ensuring query sequences integrate into the existing alignment without modifying the reference structure. This isolated alignment process prevents alterations to the primary alignment, preserving data integrity. -
Placement Stage: The extended alignment is analyzed in conjunction with a fixed reference tree. RAxML's EPA subsystem places each query sequence onto the reference tree at positions maximizing the likelihood of the overall structure. The algorithm retains a subset of high-likelihood placements for further evaluation.
-
Neighbor-Weighting Stage: This final stage computes the evolutionary distances between query sequences and the closest reference sequences, assessing the likelihood of each placement. Sequences are assigned to species and serotype clades based on proximity to reference sequences, with placements weighted by evolutionary distance. If the calculated weight exceeds a threshold, the query is assigned to the corresponding clade.
The integration of MLCA into AAV-Atlas provides a scalable and efficient tool for AAV genotyping. By leveraging RAxML's EPA feature and the structured MLCA workflow, the process delivers:
- High accuracy in species and serotype identification.
- Computational efficiency, minimizing the need for repeated full phylogenetic tree construction.
- Applicability to large-scale sequence datasets.
AAV Atlas by Robert J Gifford Lab.
For questions, issues, or feedback, please open an issue on the GitHub repository.
For collaboration please contact Dr Robert Gifford.