Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with batch effects? #8

Open
jenzopr opened this issue Feb 26, 2019 · 2 comments
Open

How to deal with batch effects? #8

jenzopr opened this issue Feb 26, 2019 · 2 comments

Comments

@jenzopr
Copy link

jenzopr commented Feb 26, 2019

Thank you for the great contribution! I'm currently trying scHPF on a few datasets and found that it was easy to apply.

I'd like to know how to properly deal with batch effects (mostly technical covariates), e.g. when input was processed using different plates. Do you have recommendations?
A rather simple solution would be to identify those factors that correlate highly with the covariate of interest - and simply leave the factor out in e.g. further dimensionality reduction?

Best,
Jens

@hannaml
Copy link
Contributor

hannaml commented Feb 28, 2019

Hi Jens,
Glad you have found scHPF useful! I think how you handle batch effects under the current scHPF model will be highly dataset dependent.

In general, I think what you propose sounds totally reasonable. Keep in mind though that many scRNA-seq experiments in human are completely batch confounded (ie on different genetic backgrounds, on people with different environmental exposures, etc.), so it can be hard to tell a technical "batch" effect from real biology. If you take the throw away approach, I think it's prudent just to check that the factor in question isn't also correlated with a covariate you care about (if possible).

Another approach we have taken is to apply scHPF separately across different background conditions, and cluster the resulting factors using the top and bottom genes to find conserved expression modules. See Figure 4, Extended Data Figure 4 and Methods in Szabo, Levitin et al, 2019.

Let me know if you have any other questions.
Best,
Hanna

@jenzopr
Copy link
Author

jenzopr commented Mar 1, 2019

Thanks a lot, Hanna, for your helpful elaboration on my proposed approach. Your correlation-based approach sounds very straight-forward and might be more suitable for what I'm after. I will give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants