Skip to content

Latest commit

 

History

History
109 lines (84 loc) · 4.42 KB

README.md

File metadata and controls

109 lines (84 loc) · 4.42 KB

R scripts for TitanCNA analysis

1. Standard Whole Genome/Exome Sequencing Analysis

R script (titanCNA.R) for running TitanCNA analysis on standard whole genome and exome sequencing data.

Input files
This script assumes that the necessary input files have been generated. These are generated by the KRONOS workflow.

  1. GC-corrected, normalized read coverage using the HMMcopy suite
    * For exome analysis, please use the targetedSequence argument to specify the dataframe containing the exon baits input from a bed file.
exons <- read.delim("exon_baits.bed", header = TRUE, as.is = TRUE)  
correctReadDepth(tumWig, normWig, gcWig, mapWig, genomeStyle = "NCBI", targetedSequence = exons)  
  1. Tumour allelic read counts at heterozygous SNPs (identifed from the normal sample).

Running the R script

  1. Look at the usage of the R script
# from the command line
> Rscript titanCNA.R --help
Usage: Rscript titanCNA.R [options]


Options:
      --id=ID
              Sample ID

      --hetFile=HETFILE
              File containing allelic read counts at HET sites. (Required)

      --cnFile=CNFILE
              File containing normalized coverage as log2 ratios. (Required)

      --outDir=OUTDIR
              Output directory to output the results. (Required)

      --numClusters=NUMCLUSTERS
              Number of clonal clusters. (Default: 1)

      --numCores=NUMCORES
              Number of cores to use. (Default: 1)

      --ploidy_0=PLOIDY_0
              Initial ploidy value; float (Default: 2)

      --estimatePloidy=ESTIMATEPLOIDY
              Estimate ploidy; TRUE or FALSE (Default: TRUE)

      --normal_0=NORMAL_0
              Initial normal contamination (1-purity); float (Default: 0.5)

      --estimateNormal=ESTIMATENORMAL
              Estimate normal contamination method; string {'map', 'fixed'} (Default: map)

      --maxCN=MAXCN
              Maximum number of copies to model; integer (Default: 8)
      (... additional arguments)

Additional arguments to consider are the following:
These arguments can be used to tune the model based on variance in the read coverage data and data-type (whole-exome sequencing or whole-genome sequencing).

--alphaK=ALPHAK
            Hyperparameter on Gaussian variance; for WES, use 2500; for WGS, use 10000; 
            float (Default: 10000)

--alphaKHigh=ALPHAKHIGH
            Hyperparameter on Gaussian variance for extreme copy number states; 
            for WES, use 2500; for WGS, use 10000; float (Default: 10000)
  1. Example usage of R script
# normalized coverage file: test.cn.txt
# allelic read count file: test.het.txt
Rscript titanCNA.R --id test --hetFile test.het.txt --cnFile test.cn.txt \
  --numClusters 1 --numCores 1 --normal_0 0.5 --ploidy_0 2 \
   --chrs "c(1:22, \"X\")" --estimatePloidy TRUE --outDir ./
  1. Running TitanCNA for multiple restarts and model selection titanCNA.R should be run with multiple restarts for different values of (a) Ploidy (2,3,4) and (b) Number of clonal clusters. This will lead to multiple solutions. Each set of solutions for a given initialization of ploidy value will be saved to a directory (e.g. run_ploidy2, run_ploidy3, run_ploidy4). The R script selectSolution.R will help select the optimal cluster from all these solutions. The output is a tab-delimited file indicating the selected solution, along with parameters for that run. It also includes the path to the results so users can collect the results.
numClusters=3
numCores=4
## run TITAN for each ploidy (2,3,4) and clusters (1 to numClusters)
echo "Maximum number of clusters: $numClusters";
for ploidy in $(seq 2 4)
  do
    echo "Running TITAN for $i clusters.";
    outDir=run_ploidy$ploidy
    mkdir $outDir
    for numClust in $(seq 1 $numClusters)
    do
      echo "Running for ploidy=$ploidy";
      Rscript titanCNA_v1.10.1.R --id test --hetFile test.het.txt --cnFile test.cn.txt \
      --numClusters $numClust --numCores $numCores --normal_0 0.5 --ploidy_0 $ploidy \
      --chrs "c(1:22, \"X\")" --estimatePloidy TRUE --outDir $outDir
    done
    echo "Completed job for $numClust clusters."
  done

## select optimal solution
Rscript selectSolution.R --ploidyRun2=run_ploidy2 --ploidyRun3=run_ploidy3 --ploidyRun4=run_ploidy4 --threshold=0.05 --outFile optimalClusters.txt