Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk TAX_IDTAXA input to speed up taxonomic assignment #16

Open
jackscanlan opened this issue Aug 1, 2024 · 0 comments
Open

Chunk TAX_IDTAXA input to speed up taxonomic assignment #16

jackscanlan opened this issue Aug 1, 2024 · 0 comments
Labels
rework Redoing or refining something

Comments

@jackscanlan
Copy link
Collaborator

Big bottleneck at the moment with larger datasets is TAX_IDTAXA, which can only run with a single core and time required scales with the combination of # of input sequences and size of the reference database.

One way around this could be to split the input sequence table into chunks of fixed size (perhaps 100 sequences?), then run those through TAX_IDTAXA, combining them at the end. This would require new TAX_CHUNK (?) and TAX_COMBINE (?) modules to bookend the existing module.

@jackscanlan jackscanlan added the rework Redoing or refining something label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rework Redoing or refining something
Projects
None yet
Development

No branches or pull requests

1 participant