Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snakemake workflow query #80

Open
GeorgetteTanner opened this issue Nov 4, 2019 · 1 comment
Open

Snakemake workflow query #80

GeorgetteTanner opened this issue Nov 4, 2019 · 1 comment

Comments

@GeorgetteTanner
Copy link

Hi Gavin

I've been running the Titan snakemake workflow and would be grateful if you could clarify a few things for me:

  1. In the optimalClusterSolution.txt file, what does the "...cluster_X_" added to the tumour name in the id and path columns refer to? I assumed this was the maximum number of clusters used in the run selected as optimum, as often the cluster number equals the total number of clusters estimated in the numClust column, but occasionally it is higher which confused me. Eg.:
Phi	id	barcode	numClust	cellPrev	purity	norm	ploidy	loglik	sdbw	path
3	tumor_1_cluster5	tumor_1	4	1     ,0.9339,0.6833,0.5808	0.6305	0.3695	3.257	-8419.32	1.3994	results/titan/hmm/titanCNA_ploidy3//tumor_1_cluster5

  1. In the config.yaml file I've adjusted the TitanCNA_alphaK to 2500 for use with WES data. Should I be adjusting TitanCNA_alphaR too (I wasn't sure what this parameter referred to)? and also add in "TitanCNA_alphaKHigh: 2500" as well?

Best regards,
Georgette

@gavinha
Copy link
Owner

gavinha commented Dec 19, 2019

Hi Gavin

I've been running the Titan snakemake workflow and would be grateful if you could clarify a few things for me:

  1. In the optimalClusterSolution.txt file, what does the "...cluster_X_" added to the tumour name in the id and path columns refer to? I assumed this was the maximum number of clusters used in the run selected as optimum, as often the cluster number equals the total number of clusters estimated in the numClust column, but occasionally it is higher which confused me. Eg.:
Phi	id	barcode	numClust	cellPrev	purity	norm	ploidy	loglik	sdbw	path
3	tumor_1_cluster5	tumor_1	4	1     ,0.9339,0.6833,0.5808	0.6305	0.3695	3.257	-8419.32	1.3994	results/titan/hmm/titanCNA_ploidy3//tumor_1_cluster5

Yes, cluster5 means that the solution was initialized with 5 clusters, but there is some post-processing to remove clusters because it tended to overfit to too many clusters.

  1. In the config.yaml file I've adjusted the TitanCNA_alphaK to 2500 for use with WES data. Should I be adjusting TitanCNA_alphaR too (I wasn't sure what this parameter referred to)? and also add in "TitanCNA_alphaKHigh: 2500" as well?

TitanCNA_alphaR is only used when the allelic read counts are modeled with a Gaussian instead of binomial. This is only relevant when allelic read counts are extremely high. For 10X Genomics data, which contain haplotype block information, the counts are aggregated over SNPs within the blocks so the counts are higher. So you only need to consider this if you are using the 10X Snakemake pipeline.

Best regards,
Georgette

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants