Chunk `TAX_IDTAXA` input to speed up taxonomic assignment #16

jackscanlan · 2024-08-01T06:03:39Z

Big bottleneck at the moment with larger datasets is TAX_IDTAXA, which can only run with a single core and time required scales with the combination of # of input sequences and size of the reference database.

One way around this could be to split the input sequence table into chunks of fixed size (perhaps 100 sequences?), then run those through TAX_IDTAXA, combining them at the end. This would require new TAX_CHUNK (?) and TAX_COMBINE (?) modules to bookend the existing module.

The text was updated successfully, but these errors were encountered:

jackscanlan added the rework Redoing or refining something label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk `TAX_IDTAXA` input to speed up taxonomic assignment #16

Chunk `TAX_IDTAXA` input to speed up taxonomic assignment #16

jackscanlan commented Aug 1, 2024

Chunk TAX_IDTAXA input to speed up taxonomic assignment #16

Chunk TAX_IDTAXA input to speed up taxonomic assignment #16

Comments

jackscanlan commented Aug 1, 2024

Chunk `TAX_IDTAXA` input to speed up taxonomic assignment #16

Chunk `TAX_IDTAXA` input to speed up taxonomic assignment #16