Skip to content

Amino Acid Variation

Robert J. Gifford edited this page Dec 19, 2024 · 3 revisions

Methodology for Determining Amino Acid Variation

  1. Feature Selection
    AAV-Atlas focuses on genomic features that encode amino acids. These features are identified based on metadata tags (CODES_AMINO_ACIDS) in the AAV-Atlas database. Each feature is associated with:

    • A reference genome (REF_MASTER_AAV2), which serves as the standard for comparison.
    • A set of aligned sequences from various sources, including public repositories like GenBank.
  2. Alignment and Sequence Processing
    The analysis begins by processing multiple sequence alignments:

    • Alignments are organized hierarchically. Only tip alignments---those without child alignments---are considered, ensuring the analysis targets the finest resolution of aligned sequences.
    • For each alignment, the script processes member sequences corresponding to the feature of interest.
  3. Codon and Amino Acid Comparison
    For every sequence within the alignment:

    • Amino acid residues are retrieved for each codon in the selected feature.
    • The retrieved residues are compared against those encoded in the reference genome.
  4. Replacement Identification
    A replacement is recorded when:

    • The amino acid in the sequence differs from the reference residue.
    • The codon does not contain ambiguous nucleotides (N), or the replacement is unambiguously a single amino acid residue.
    • Both the reference and replacement residues are biologically meaningful (e.g., not * or X).
  5. Annotation of Replacements
    Each replacement is annotated with:

    • The feature it belongs to.
    • The codon position and corresponding nucleotide sequence.
    • The reference amino acid and replacement amino acid.
    • A unique identifier (feature:refAA:codonLabel:replacementAA) for traceability.
  6. Biochemical Classification
    Replacements are further classified based on their biochemical properties using established metrics:

    • Hanada's Radical/Conservative Categories (2006): Assesses changes in biochemical properties such as charge and polarity.
    • Grantham Distance (1974): Quantifies the biochemical distance between two amino acids.
    • Miyata Distance (1979): Measures the evolutionary distance between amino acids.
  7. Database Integration
    Identified replacements are stored in custom tables within AAV-Atlas:

    • The primary table (aav_replacement) catalogs the replacements and their associated metadata.
    • A secondary table (aav_replacement_sequence) links replacements to the sequences in which they were observed.
  8. Variation Creation
    Each replacement is linked to a variation object in the database, enabling downstream analyses such as:

    • Visualization: Mapping replacements to protein structures or genome browsers.
    • Queries: Retrieving replacements by feature, codon, or biochemical classification.
    • Phylogenetic Analysis: Studying replacement patterns across evolutionary lineages.