Skip to content

Example: PD graph of highly similar sequences

Benjamin Braun edited this page Jun 10, 2021 · 1 revision

This approach works well for protein-coding sequences of any length. The distances will reflect pairwise PD scores over the entire sequences after alignment, so the sequences must be very similar for this distance to be interesting.

We will align with a program such as clustalw, convert the alignment to a FASTA using Jalview (with alignment dashes - included). We will then use the un-windowed PD distance feature of DGraph and finally use DGraph to create the figure.

  1. Align with Clustal and export an aln file or similar
  2. Open the alignment file in Jalview
  3. In Jalview in the alignment window File -> Save as... select Fasta. You may need to select all the sequences (Ctrl+A) before exporting.
  4. Run DGraph select 'Utility: Fasta -> PD Score Matrix: assume pre-aligned sequences' and provide the fasta from (3)
  5. Run DGraph select 'Run DGraph' and then 'Fasta sequences + custom distance matrix' and use the fasta from step (3) and the distance matrix from step (4).
  6. You will probably need to adjust the physics parameters for long sequences, press j to open up the settings. The line drawing cutoff should be set <= the largest distance you see in the distance matrix in step (4) (open it in a text viewer to read the distances.)
  7. You can show tags on the figure with the a button, press s to save PDF + PNG export of the view.
Clone this wiki locally