Running compass with data from multiple samples #18

johan-gson · 2024-08-25T03:30:57Z

Hi,

I'm experimenting with running COMPASS (using CNVs) with multiple samples (3 samples), where the cells then come from different runs with MissionBio. A problem is that they got sequenced at different depth, the difference is pretty large, which means that copy number counts for regions vary a lot between samples (and hence cells). I looked at the code, I couldn't see that you normalize them per cell or anything (correct me if I'm wrong) - should I normalize the data somehow before sending it in? One obvious thing would just be to normalize the copy numbers across samples, and potentially across amplicons if that is a problem. Would you recommend doing something like that?

Another question: From the MissionBio protein, I know which cells are normal, and which are likely malignant. Can I supply that information somehow? I also know which variants are somatic and which are germline - is it enough to set FREQ to 0 for the somatic and 1 for the germline? The germline are there to support the CNV.

e-sollier · 2024-08-25T06:31:35Z

Hi,

Thanks for your interest in COMPASS!
The different sequencing depth across cells should not be a problem. COMPASS models the read counts in regions with a negative binomial distribution, so different sequencing depths should be fine.
There is currently no way to give as input to COMPASS the status normal/malignant. But I guess you could use this information after running COMPASS to check that COMPASS' results make sense.
Yes if you set the FREQ of germline variants to a high frequency, this should greatly increase the likelihood that COMPASS puts them at the root of the tree.

johan-gson · 2024-08-25T16:05:03Z

Hi and thanks for a quick reply! The problem I'm experiencing is that I have two (at least almost) clonal mutations, which then exist in all cancer cells (set FREQ=0), roughly 1,000 cells. Then I have roughly 1600 normal cells. I also throw in ~60 germline variants (FREQ=1) to support CNVs. What happens after some manipulation of the parameters is that the germline events end up at the root with very few cells, followed by a node with the two mutations and very few cells, and below that a branch with CNLOH where those mutations are lost - here the big lump of cells end up, including the normals. I understand how the algorithm can find this appealing, but it is just not right :). There is also another CNLOH that I want captured, which it seems to do fine with. Since I have done my fair share of c++ coding, I think I can modify the algorithm by sending in information about both the cells and the events, to more clearly separate the handling of germline and somatic variants (by penalizing them differently) and also penalize the cells differently. I think this can just be added to Tree::compute_prior_score, would that make sense? Thanks for the help - I'll create a fork and see what I can do!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running compass with data from multiple samples #18

Running compass with data from multiple samples #18

johan-gson commented Aug 25, 2024

e-sollier commented Aug 25, 2024

johan-gson commented Aug 25, 2024

Running compass with data from multiple samples #18

Running compass with data from multiple samples #18

Comments

johan-gson commented Aug 25, 2024

e-sollier commented Aug 25, 2024

johan-gson commented Aug 25, 2024