You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment processes request resources (CPU, memory and time) based on the Nextflow label system, which works well for processes that have fixed resource requirements largely regardless of inputs (eg. PARSE_INPUTS will always require minimal resources) but doesn't work well for most others as their requirements scale based on the size of the input files, particularly read files and database files. For this pipeline, this would mostly impact time and memory requirements for each process, as CPU requirements are probably fixed for most processes.
The benefit of this would be very efficient use of HPC resources and therefore possibly faster SLURM execution runtimes.
See here for a particular (if-else) implementation, but multiplication of resources by read count, file size, database size etc. is also possible using Groovy code.
Note that different processes scale might differently -- eg. READ_FILTER time might scale linearly with the number of input reads, but a hypothetical all-vs-all alignment step might scale quadratically with the number of input sequences. Could require some playing around to find the relationships in some cases, but probably safe to first assume linearity.
The text was updated successfully, but these errors were encountered:
This is becoming a noticeable issue for multi-flowcell runs, where the PHYLOSEQ_* processes can run out of memory.
One thing to do could be to look for memory-saving code cleaning (eg. removal of unneeded objects, although this makes debugging more challenging in some instances), but proportional resource allocation would also help
At the moment processes request resources (CPU, memory and time) based on the Nextflow
label
system, which works well for processes that have fixed resource requirements largely regardless of inputs (eg.PARSE_INPUTS
will always require minimal resources) but doesn't work well for most others as their requirements scale based on the size of the input files, particularly read files and database files. For this pipeline, this would mostly impact time and memory requirements for each process, as CPU requirements are probably fixed for most processes.The benefit of this would be very efficient use of HPC resources and therefore possibly faster SLURM execution runtimes.
See here for a particular (if-else) implementation, but multiplication of resources by read count, file size, database size etc. is also possible using Groovy code.
Note that different processes scale might differently -- eg.
READ_FILTER
time might scale linearly with the number of input reads, but a hypothetical all-vs-all alignment step might scale quadratically with the number of input sequences. Could require some playing around to find the relationships in some cases, but probably safe to first assume linearity.The text was updated successfully, but these errors were encountered: