written by: Yunyi Cheng edited by: Clare Gillis
MUSCLE is a computer program for creating multiple sequence alignments of proteins. It incorporates fast distance estimation using k mer counting, progressive alignment using a profile function we call the log‐expectation score, and refinement using tree‐dependent restricted partitioning. It could be used in phylogenetic tree estimation, structure prediction and critical residue identification, which are useful for virus identification and discovery.
Tutorial Objective: We will use MUSCLE
to produce multiple sequence alignment for hemoglobin subunit sequences from humans, house mice and goats.
- Tool Weblink
- Link to example data formats
- Web browser
- Three or more sequences of interest in GCG, FASTA, EMBL (Nucleotide only), GenBank, PIR, NBRF, PHYLIP, or UniProtKB/Seiss-Prot (Protein only) format directly, or upload a file.
The result for multiple sequence alignment is displayed in a browser tab, with sections being Alignment
, Tool Output
, Phylogenetic Tree
, and Result Viewers
. There are seven options listed in a dropdown list, from which users may select one to proceed with the task:
-
Pearson/FASTA: Plain text format for storing sequences; starts with a header line beginning with ">" followed by the sequence.
-
ClustalW: Alignment output from the Clustal family of tools, organized in blocks.
-
ClustalW (strict): Similar to standard ClustalW, but with stricter output conventions.
-
HTML: Sequence alignments rendered as HTML, typically with color-coded residues for clarity.
-
GCG MSF: A format developed for the GCG suite of bioinformatics tools.
-
Phylip interleaved: A compact format used by the Phylip suite of programs, displaying sequences interleaved across lines.
-
Phylip sequential: Similar to Phylip interleaved, but sequences are presented consecutively in blocks.
For the sake of demonstration, we will choose Pearson/fasta as the output format.
1. Navigate to MUSCLE Web Tool
-
Alignment results visualization with amino acids in different colours and buttons for zooming in and out
-
The output of the tool is shown in this section, users can
Download
the tool output orShow
alignment with colours -
The phylogenetic tree shows the evolutionary relationship between input sequences with a sliding tile to zoom in and out
-
Links to related result viewers to further investigate the results
That's it! You've used the MUSCLE
to produce multiple sequence alignment for hemoglobin subunit sequences from humans, house mice and goats!
In this example, we can see that loci 102-111 align very well between the three proteins, so they may posess an important function. We can gather stronger evidence for this hypothesis by aligning other, similar sequences along these and checking if loci 102-111 are similar for these new sequences.
When it comes to viruses, users can follow this example (substituting example data with real virus proteins) to investigate conserved regions, variants, and evolutionary relationships when it comes to viral proteins.
- MUSCLE: multiple sequence alignment with high accuracy and high throughput
- Download source code
- Documentation
- Home page
- For other multiple sequence alignment tools see: EMBL Multiple Sequence Alignment