Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make process resource requirements depend dynamically on input file properties #6

Open
jackscanlan opened this issue Jul 22, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@jackscanlan
Copy link
Collaborator

At the moment processes request resources (CPU, memory and time) based on the Nextflow label system, which works well for processes that have fixed resource requirements largely regardless of inputs (eg. PARSE_INPUTS will always require minimal resources) but doesn't work well for most others as their requirements scale based on the size of the input files, particularly read files and database files. For this pipeline, this would mostly impact time and memory requirements for each process, as CPU requirements are probably fixed for most processes.

The benefit of this would be very efficient use of HPC resources and therefore possibly faster SLURM execution runtimes.

See here for a particular (if-else) implementation, but multiplication of resources by read count, file size, database size etc. is also possible using Groovy code.

Note that different processes scale might differently -- eg. READ_FILTER time might scale linearly with the number of input reads, but a hypothetical all-vs-all alignment step might scale quadratically with the number of input sequences. Could require some playing around to find the relationships in some cases, but probably safe to first assume linearity.

@jackscanlan
Copy link
Collaborator Author

This is becoming a noticeable issue for multi-flowcell runs, where the PHYLOSEQ_* processes can run out of memory.

One thing to do could be to look for memory-saving code cleaning (eg. removal of unneeded objects, although this makes debugging more challenging in some instances), but proportional resource allocation would also help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant