Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running compass with data from multiple samples #18

Open
johan-gson opened this issue Aug 25, 2024 · 2 comments
Open

Running compass with data from multiple samples #18

johan-gson opened this issue Aug 25, 2024 · 2 comments

Comments

@johan-gson
Copy link

Hi,

I'm experimenting with running COMPASS (using CNVs) with multiple samples (3 samples), where the cells then come from different runs with MissionBio. A problem is that they got sequenced at different depth, the difference is pretty large, which means that copy number counts for regions vary a lot between samples (and hence cells). I looked at the code, I couldn't see that you normalize them per cell or anything (correct me if I'm wrong) - should I normalize the data somehow before sending it in? One obvious thing would just be to normalize the copy numbers across samples, and potentially across amplicons if that is a problem. Would you recommend doing something like that?

Another question: From the MissionBio protein, I know which cells are normal, and which are likely malignant. Can I supply that information somehow? I also know which variants are somatic and which are germline - is it enough to set FREQ to 0 for the somatic and 1 for the germline? The germline are there to support the CNV.

@e-sollier
Copy link
Collaborator

Hi,

Thanks for your interest in COMPASS!
The different sequencing depth across cells should not be a problem. COMPASS models the read counts in regions with a negative binomial distribution, so different sequencing depths should be fine.
There is currently no way to give as input to COMPASS the status normal/malignant. But I guess you could use this information after running COMPASS to check that COMPASS' results make sense.
Yes if you set the FREQ of germline variants to a high frequency, this should greatly increase the likelihood that COMPASS puts them at the root of the tree.

@johan-gson
Copy link
Author

Hi and thanks for a quick reply! The problem I'm experiencing is that I have two (at least almost) clonal mutations, which then exist in all cancer cells (set FREQ=0), roughly 1,000 cells. Then I have roughly 1600 normal cells. I also throw in ~60 germline variants (FREQ=1) to support CNVs. What happens after some manipulation of the parameters is that the germline events end up at the root with very few cells, followed by a node with the two mutations and very few cells, and below that a branch with CNLOH where those mutations are lost - here the big lump of cells end up, including the normals. I understand how the algorithm can find this appealing, but it is just not right :). There is also another CNLOH that I want captured, which it seems to do fine with. Since I have done my fair share of c++ coding, I think I can modify the algorithm by sending in information about both the cells and the events, to more clearly separate the handling of germline and somatic variants (by penalizing them differently) and also penalize the cells differently. I think this can just be added to Tree::compute_prior_score, would that make sense? Thanks for the help - I'll create a fork and see what I can do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants