Project goal:
- Quantify SRC result variability across SRC tools
- Quantify SRC result variability across initializing random seed
In this project, we have expanded the subclonal reconstruction pipeline https://github.com/uclahs-cds/pipeline-call-SRC to accept the output of multiple mutation callers.
We have integrated 4 additional mutation callers (MuTect2, SomaticSniper, Strelka2, Battenberg) by creating parsers that extract variant data from the different tools' output. The parsers can be found here https://github.com/uclahs-cds/tool-SRC-util.
Generate initial seed:
head -c 4 /dev/urandom | od -An -tu4
=> 3058353505
Generate 10 random seeds:
import random
random.seed(3058353505)
random.sample(range(0, 1000000), k=10)
Generated/chosen seeds:
[51404, 366306, 423647, 838004, 50135, 628019, 97782, 253505, 659767, 13142]
[HNSC] Head and Neck cohort: /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/data/
[CPCGENE] Prostate cohort: /hot/project/disease/ProstateTumor/PRAD-000005-293PT/
sSNV-caller: mutect2
, strelka2
, somaticsniper
sCNA-caller: battenberg
src-tool: pyclone-vi
, dpclust
, phylowgs
- PyClone-VI: F32 (average run time 15s - 10min)
- DPClust: F32 (average run time 5min - 40min)
- PhyloWGS: F72 (average run time 7h - 24h)
sSNV-caller output:
/hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/data/SNV/<sSNV-caller>/recsnv/vcfs/
sCNA-caller output:
/hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/data/<sCNA-caller>/
[HNSC]
- Single region mode (sr) (run on primary tumour only)
- Multi region mode (mr) (run on primary and metastatic tumours)
[CPGENE]
- Multi region mode (mr) (run on multiple regions of primary tumour)
The pipeline is run for each sample and seed in both single region and multi region mode.
Templates: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/templates/
configs:
/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-<sSNV-caller>-<sCNA-caller>-<src-tool>/input/config/seed_<seed>.config
- 1 config per
seed
- indicates
src-tool
choice and pipeline parameters - indicates pipeline run output directory
yamls:
/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/<sSNV-caller>_<sCNA-caller>_yamls/
single-region mode (one primary tumor sample:
sinlge-region/<sample_id>.yaml
multi-region mode (primary and metastatic tumor samples):
multi-region/<patient_id>.yaml
- 1 yaml per patient
- path to
sSNV-caller
output - path to
sCNA-caller
output
Log files:
/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-<sSNV-caller>-<sCNA-caller>-<src-tool>/logs/
Create conda environment with pyyaml
and numpy
installed. Execute submission script in logs directory and from activated conda environment.
submission script: <mode>_<sSNV-caller>_<sCNA-caller>_<src-tool>_submission_script.sh
- Strelka2-Battenberg-PyClone-VI (sr/mr)
- Strelka2-Battenberg-DPClust (sr)
- Strelka2-Battenberg-PhyloWGS (sr)
- SomaticSniper-Battenberg-PyClone-VI (sr/mr)
- SomaticSniper-Battenberg-DPClust (sr)
- SomaticSniper-Battenberg-PhyloWGS (sr)
- Mutect2-Battenberg-PyClone-VI (sr/mr)
- Mutect2-Battenberg-DPClust (sr)
- Mutect2-Battenberg-PhyloWGS (sr)
Output files:
/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-<sSNV-caller>-<sCNA-caller>-<src-tool>/output/
Author: Anna Neiman-Golden([email protected]), Philippa Steinberg([email protected])
[This project] is licensed under the GNU General Public License version 2. See the file LICENSE.md for the terms of the GNU GPL license.
<one line to give the project/program's name and a brief idea of what it does.>
Copyright (C) 2021 University of California Los Angeles ("Boutros Lab") All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.