-
Notifications
You must be signed in to change notification settings - Fork 0
Amino Acid Variation
-
Feature Selection
AAV-Atlas focuses on genomic features that encode amino acids. These features are identified based on metadata tags (CODES_AMINO_ACIDS
) in the AAV-Atlas database. Each feature is associated with:- A reference genome (
REF_MASTER_AAV2
), which serves as the standard for comparison. - A set of aligned sequences from various sources, including public repositories like GenBank.
- A reference genome (
-
Alignment and Sequence Processing
The analysis begins by processing multiple sequence alignments:- Alignments are organized hierarchically. Only tip alignments---those without child alignments---are considered, ensuring the analysis targets the finest resolution of aligned sequences.
- For each alignment, the script processes member sequences corresponding to the feature of interest.
-
Codon and Amino Acid Comparison
For every sequence within the alignment:- Amino acid residues are retrieved for each codon in the selected feature.
- The retrieved residues are compared against those encoded in the reference genome.
-
Replacement Identification
A replacement is recorded when:- The amino acid in the sequence differs from the reference residue.
- The codon does not contain ambiguous nucleotides (
N
), or the replacement is unambiguously a single amino acid residue. - Both the reference and replacement residues are biologically meaningful (e.g., not
*
orX
).
-
Annotation of Replacements
Each replacement is annotated with:- The feature it belongs to.
- The codon position and corresponding nucleotide sequence.
- The reference amino acid and replacement amino acid.
- A unique identifier (
feature:refAA:codonLabel:replacementAA
) for traceability.
-
Biochemical Classification
Replacements are further classified based on their biochemical properties using established metrics:- Hanada's Radical/Conservative Categories (2006): Assesses changes in biochemical properties such as charge and polarity.
- Grantham Distance (1974): Quantifies the biochemical distance between two amino acids.
- Miyata Distance (1979): Measures the evolutionary distance between amino acids.
-
Database Integration
Identified replacements are stored in custom tables within AAV-Atlas:- The primary table (
aav_replacement
) catalogs the replacements and their associated metadata. - A secondary table (
aav_replacement_sequence
) links replacements to the sequences in which they were observed.
- The primary table (
-
Variation Creation
Each replacement is linked to a variation object in the database, enabling downstream analyses such as:- Visualization: Mapping replacements to protein structures or genome browsers.
- Queries: Retrieving replacements by feature, codon, or biochemical classification.
- Phylogenetic Analysis: Studying replacement patterns across evolutionary lineages.
AAV Atlas by Robert J Gifford Lab.
For questions, issues, or feedback, please open an issue on the GitHub repository.
For collaboration please contact Dr Robert Gifford.