Handling huge indexes with few GBs of RAM #304

leoisl · 2022-11-08T15:54:47Z

This issue describes a new feature in pandora to handle huge indexes with few GBs of RAM. The concrete example we have is an index with 186k PRGs, mostly linear. The main use case accounts for almost 1M PRGs. For this "small" example with 186k PRGs, running pandora compare with reads from 114 samples results in only 13.7k genes actually being found and being in the final multisample matrix/vcf (7.3%). pandora takes 15.6 GB of RAM to run compare in this case, but could possibly do it with just a fraction of this RAM if it loaded the index just for the relevant 13.7k genes, instead of all 186k genes. RAM usage will be much higher for 1M PRGs, and we want to keep this runnable for common user desktops, i.e. at most 13 or 14 GB of usage. For this use case, we have a fixed vcf-ref for each PRG, so we could also run pandora compare (or map in this case) per sample and merge results later. This feature is particularly important for running pandora compare/map for one sample, as even less genes will be loaded.

The text was updated successfully, but these errors were encountered:

leoisl · 2022-11-08T19:59:47Z

To do this, need to implement:

leoisl self-assigned this Nov 8, 2022

leoisl mentioned this issue Nov 8, 2022

Refactor pandora index into a single, self-contained file #306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling huge indexes with few GBs of RAM #304

Handling huge indexes with few GBs of RAM #304

leoisl commented Nov 8, 2022

leoisl commented Nov 8, 2022

Handling huge indexes with few GBs of RAM #304

Handling huge indexes with few GBs of RAM #304

Comments

leoisl commented Nov 8, 2022

leoisl commented Nov 8, 2022