Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the impact on clustering of changing minimap2 index to use less RAM #31

Open
leoisl opened this issue May 17, 2022 · 3 comments
Open

Comments

@leoisl
Copy link
Collaborator

leoisl commented May 17, 2022

The current minimap2 index was built with -I 12G to match the H2H index. This pushes the tbpore RAM usage when running tbpore process to 13.1GB. We could instead build the index with -I 500M, which would take the tbpore process RAM down to ~5GB, which is much more runnable in a personal laptop, but then the results are not identical to the H2H results. We should evaluate the impact of this different index on the clustering and on the tbpore results in general, and infer if is indeed OK to switch to this lighter index. This might be related to #22

@leoisl
Copy link
Collaborator Author

leoisl commented Nov 1, 2022

We might be able to keep -I 12G and still use less RAM. The trick would be to use this minimap2 param:

       --idx-no-seq
                 Don't  store  target  sequences  in  the index. It saves disk
                 space and memory but the index  generated  with  this  option
                 will  not  work  with -a or -c.  When base-level alignment is
                 not requested, this option is automatically applied.

... although when we map reads to the decontamination minimap2 index, we do require base-level mapping (i.e. we run with flags -aL). But looking downstream I think we don't need these flags and can parse a PAF file. It all depends on whether we indeed need to decrease RAM or not. @FlorianePoint could you please tells us if you have observed any RAM issue when running tbpore either on your site or in Madagascar? Thanks!

@FlorianePoint
Copy link

Hi Leandro,
Yes we (Nanah in Mada and I) already had RAM issue when using tbpore with a minimap2 return code -9. It happened when we had less than 13G free.
Floriane

@mbhall88
Copy link
Owner

mbhall88 commented Nov 1, 2022

But looking downstream I think we don't need these flags and can parse a PAF file.

Correct. I used to extract the reads from the SAM, but have have since switch to using seqkit grep to get the read ids from fastqs. So PAF will be fine I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants