Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling huge indexes with few GBs of RAM #304

Open
leoisl opened this issue Nov 8, 2022 · 1 comment
Open

Handling huge indexes with few GBs of RAM #304

leoisl opened this issue Nov 8, 2022 · 1 comment
Assignees

Comments

@leoisl
Copy link
Collaborator

leoisl commented Nov 8, 2022

This issue describes a new feature in pandora to handle huge indexes with few GBs of RAM. The concrete example we have is an index with 186k PRGs, mostly linear. The main use case accounts for almost 1M PRGs. For this "small" example with 186k PRGs, running pandora compare with reads from 114 samples results in only 13.7k genes actually being found and being in the final multisample matrix/vcf (7.3%). pandora takes 15.6 GB of RAM to run compare in this case, but could possibly do it with just a fraction of this RAM if it loaded the index just for the relevant 13.7k genes, instead of all 186k genes. RAM usage will be much higher for 1M PRGs, and we want to keep this runnable for common user desktops, i.e. at most 13 or 14 GB of usage. For this use case, we have a fixed vcf-ref for each PRG, so we could also run pandora compare (or map in this case) per sample and merge results later. This feature is particularly important for running pandora compare/map for one sample, as even less genes will be loaded.

@leoisl
Copy link
Collaborator Author

leoisl commented Nov 8, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant