Skip to content

Commit

Permalink
build trees
Browse files Browse the repository at this point in the history
  • Loading branch information
blue-moon22 committed Jun 1, 2023
1 parent 7874f66 commit a206c4c
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 1 deletion.
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ bsub -G team284 -q normal -J combine_fastas -o ../log/combine_fastas.out -e ../l

_\*Run by Ana Ferreira_

## Output of the mSWEEP pipeline
## Outputs from pipeline

The output files can be found in study folders within `/nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/msweep_output`. Each output is separated in directories called by the lane id of the deep sequence and number of reads that they were subsampled to and contain:

Expand All @@ -103,3 +103,24 @@ The output files can be found in study folders within `/nfs/users/nfs_v/vc11/scr
- `fastq.gz` files: the reads that have been identified belonging to a particular GPSC
- directories named after the GPSC number: contain the assemblies of the GPSCs as `contigs.fa` and the serotype in `*_seroba` directory _(Note: NA may exist if there were alignments to isolate genome(s) that had no GPSC)_
- `tmp` directory: contains the alignment files where the deep sequences were aligned (with Themisto) to the reference

## Build phylogenetic trees from output deconvoluted reads

Create phylogenetic trees from deconvoluted reads [using snippy, gubbins, snp-sites and FastTree](https://github.com/tseemann/snippy)

In the scripts directory:
```
# Create an input.txt file for snippy
find /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/msweep_output/6461/*/ /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/msweep_output/6463/*/ -type f -name "*_1.fastq.gz" > ../data/path_to_reads_1.txt
find /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/msweep_output/6461/*/ /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/msweep_output/6463/*/ -type f -name "*_2.fastq.gz" > ../data/path_to_reads_2.txt
cat ../data/path_to_reads_1.txt | awk -F/ '{print $12"_"$13}' | awk -F'_1.fastq.gz' '{print $1}' > ../data/msweep_samples.txt
paste -d $'\t' ../data/msweep_samples.txt ../data/path_to_reads_1.txt ../data/path_to_reads_2.txt | grep -v "_NA" > ../data/snippy_input.tab
# Build trees (tested with 10)
head ../data/snippy_input.tab > ../data/snippy_input_head.tab
bsub -G team284 -q normal -J build_tree -o ../log/build_tree.out -e ../log/build_tree.err -R"span[hosts=1]" -R "select[mem>16000] rusage[mem=16000]" -M16000 -n 4 "./run_build_trees.sh /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/snippy_input_head.tab /data/pam/applications/vr-pipelines/refs/Streptococcus/pneumoniae_ATCC_700669/Streptococcus_pneumoniae_ATCC_700669_v1.fa 4 ../data/test_build_tree"
# Build trees (not implemented)
bsub -G team284 -q normal -J build_tree -o ../log/build_tree.out -e ../log/build_tree.err -R"span[hosts=1]" -R "select[mem>64000] rusage[mem=64000]" -M64000 -n 16 "./run_build_trees.sh /nfs/users/nfs_v/vc11/scratch/ANALYSIS/deep_seq/data/snippy_input.tab /data/pam/applications/vr-pipelines/refs/Streptococcus/pneumoniae_ATCC_700669/Streptococcus_pneumoniae_ATCC_700669_v1.fa 16 <your_output_directory_here>"
```

21 changes: 21 additions & 0 deletions scripts/run_build_trees.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash

module load snippy/4.6.0
module load gubbins/3.2.1
module load snp-sites/2.5.1--hed695b0_0
module load fasttree/2.1.10=h470a237_2-c1

snippy_input=$1
reference=$2
cpus=$3
output_dir=$4

mkdir -p ${output_dir}
cd ${output_dir}

snippy-multi $snippy_input --ref $reference --cpus $cpus > runme.sh
sh runme.sh
snippy-clean_full_aln core.full.aln > clean.full.aln
run_gubbins.py -p gubbins clean.full.aln
snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln
FastTree -gtr -nt clean.core.aln > clean.core.tree

0 comments on commit a206c4c

Please sign in to comment.