Skip to content

Latest commit

 

History

History
91 lines (53 loc) · 4.42 KB

muscle.md

File metadata and controls

91 lines (53 loc) · 4.42 KB

MUSCLE

written by: Yunyi Cheng edited by: Clare Gillis

MUSCLE is a computer program for creating multiple sequence alignments of proteins. It incorporates fast distance estimation using k mer counting, progressive alignment using a profile function we call the log‐expectation score, and refinement using tree‐dependent restricted partitioning. It could be used in phylogenetic tree estimation, structure prediction and critical residue identification, which are useful for virus identification and discovery.

Tutorial Objective: We will use MUSCLE to produce multiple sequence alignment for hemoglobin subunit sequences from humans, house mice and goats.

Input / Prerequisites

  • Tool Weblink
  • Link to example data formats
  • Web browser
  • Three or more sequences of interest in GCG, FASTA, EMBL (Nucleotide only), GenBank, PIR, NBRF, PHYLIP, or UniProtKB/Seiss-Prot (Protein only) format directly, or upload a file.

Output

The result for multiple sequence alignment is displayed in a browser tab, with sections being Alignment, Tool Output, Phylogenetic Tree, and Result Viewers. There are seven options listed in a dropdown list, from which users may select one to proceed with the task:

  1. Pearson/FASTA: Plain text format for storing sequences; starts with a header line beginning with ">" followed by the sequence.

  2. ClustalW: Alignment output from the Clustal family of tools, organized in blocks.

  3. ClustalW (strict): Similar to standard ClustalW, but with stricter output conventions.

  4. HTML: Sequence alignments rendered as HTML, typically with color-coded residues for clarity.

  5. GCG MSF: A format developed for the GCG suite of bioinformatics tools.

  6. Phylip interleaved: A compact format used by the Phylip suite of programs, displaying sequences interleaved across lines.

  7. Phylip sequential: Similar to Phylip interleaved, but sequences are presented consecutively in blocks.

For the sake of demonstration, we will choose Pearson/fasta as the output format.

1. Navigate to MUSCLE Web Tool

2. Under the section 'Input Sequence', click on the button Use the example

3. Under section Parameters, select Pearson/FASTA to be the output format

4. Under the section Submit, name the job and hit Submit

5. View the result

  • Alignment results visualization with amino acids in different colours and buttons for zooming in and out alignment

  • The output of the tool is shown in this section, users can Download the tool output or Show alignment with colours tool output

  • The phylogenetic tree shows the evolutionary relationship between input sequences with a sliding tile to zoom in and out phylogenetic tree

  • Links to related result viewers to further investigate the results result Viewers

  • All result files and links to download them result files

  • Submission details about this job submission details

Conclusion

That's it! You've used the MUSCLE to produce multiple sequence alignment for hemoglobin subunit sequences from humans, house mice and goats!

In this example, we can see that loci 102-111 align very well between the three proteins, so they may posess an important function. We can gather stronger evidence for this hypothesis by aligning other, similar sequences along these and checking if loci 102-111 are similar for these new sequences.

Loci 102-111

When it comes to viruses, users can follow this example (substituting example data with real virus proteins) to investigate conserved regions, variants, and evolutionary relationships when it comes to viral proteins.

See Also: