-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Steps used to analyse PANGEA data from South Africa.
-
South Africa
- Step 1: Get South African sequences from PANGEA extract
- Step 2: Get gag, pol and env from HIV genomic sequences
- Step 3: Get only gene sequences >= 800 bp
- Step 4: Subtype sequences per gene
- Step 5: Download GenBank HIV sequences
- Step 6: Get sequences from PANGEA
- Step 7: Create FASTA database for blast
- Step 8: Merge databases for blast
- Step 9: blastn per gene
- Step 10: Get 3 best matches
- Step 11: Get metadata for best matches
- The case of env gene
- Step 12: DNA sequence alignment
- Step 13: Removal of drug resistance sites
- Step 14: Detection of recombinant sequences
- Step 15: Removal of additional CGR sequences
- Step 16: Estimation of phylogenetic trees
- Final DNA sequence alignment
- Wait for ML runs to finish then run bash scripts. Read about scripts here.
- After running bash scripts, you should have the best ML tree. Remove the last 0,0 that RAxML add to the newick file. I remember that this would cause tree to show as rooted instead of unrooted.
- Run treedater on the best ML tree. You can use R scripts treedater_gag.R, treedater_pol.R, and treedater_env.R.
NOTE that scripts used for treedater analyses are organized in three sets per gene. For example: treedater_pol.R, pol.R and get_SA_data.R. You should run script treedater_pol.R only. The other two scripts are sourced inside this script or pol.R.
Because I run the search for best maximum likelihood tree in parallel and using RAxML-NG, I wrote the bash scripts located here to pos-process all the files that are generated.
ML_script.sh is used to pos-process files generated when searching for the best maximum likelihood tree.
bootstrap_script.sh is used to pos-process files for the bootstrapped trees.
ALWAYS REMEMBER TO CHANGE IN SCRIPT THE DIRECTORY IN WHICH YOUR FILES ARE LOCATED
VERSION OF PROGRAMS I USED
MAFFT version 7.427 (2019/Mar/24)
RAxML-NG version 0.8.1
RDP4 version 4.97
AliView version 1.18.1
R version 3.5.1
BLAST command line applications version 2.6.0+
EDirect version 11.1