`MATCH_BOLD` fails to cache on `-resume` #43

jackscanlan · 2024-11-06T01:18:15Z

MATCH_BOLD processes tend to need to be restarted when resuming, although this is inconsistent (sometimes 1 is cached, sometimes 21 etc.).

This is a problem because it means the pipeline needs to be restarted relatively early when an error occurs during a later process. But not a huge priority as this mainly affects development at this stage.

Suspect this might have something to do with use of collect() in the taxreturn.nf workflow:

        //// extract sequences from BOLD database file
        EXTRACT_BOLD (
            ch_bold_db_chunks,
            PARSE_TARGETS.out.bold_names,
            PARSE_TARGETS.out.bold_rank,
            PARSE_MARKER.out.bold_query
        )

        //// match BOLD taxon names to NCBI taxon names
        MATCH_BOLD (
            EXTRACT_BOLD.out.tibble, 
            GET_NCBI_TAXONOMY.out.lineageparents,
            GET_NCBI_TAXONOMY.out.synonyms
        )

        //// merge BOLD chunks into single .fasta and .csv files
        MERGE_BOLD (
            MATCH_BOLD.out.fasta.collect(),
            MATCH_BOLD.out.matching_taxids.collect(),
            MATCH_BOLD.out.synchanges.collect()
        )

This is unclear though. collectFile() used to cause issues: nextflow-io/nextflow#3466

The text was updated successfully, but these errors were encountered:

jackscanlan added bug Something isn't working low priority labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MATCH_BOLD` fails to cache on `-resume` #43

`MATCH_BOLD` fails to cache on `-resume` #43

jackscanlan commented Nov 6, 2024

MATCH_BOLD fails to cache on -resume #43

MATCH_BOLD fails to cache on -resume #43

Comments

jackscanlan commented Nov 6, 2024

`MATCH_BOLD` fails to cache on `-resume` #43

`MATCH_BOLD` fails to cache on `-resume` #43