Sourmash ONT taxonomy questions #3480

peterdoug · 2025-01-10T14:58:53Z

First, thanks for such a great tool! Sourmash (especially with the branchwater plugin) is incredibly impressive.

Firstly, I'm interested in using sourmash for ONT metagenome taxonomic assignment. From the paper "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets", sourmash seems to struggle a bit with ONT.
If the root issue here is low read accuracy, does anyone have any experience with pre-processing using read error correction tools such as herro or dechat?

Secondly, does anyone have experience with using translated long reads with sourmash-sketched protein databases for taxonomic assignment? Do you expect this to improve taxonomic assignment?
In relation to this, I see that base sourmash currently supports translated sketching. Is this a feature you are considering adding to the branchwater plugin?

Thanks for any answers or comments!

ctb · 2025-01-12T19:14:57Z

First, thanks for such a great tool! Sourmash (especially with the branchwater plugin) is incredibly impressive.

Thank you! Flattery will get you many places :)

Firstly, I'm interested in using sourmash for ONT metagenome taxonomic assignment. From the paper "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets", sourmash seems to struggle a bit with ONT. If the root issue here is low read accuracy, does anyone have any experience with pre-processing using read error correction tools such as herro or dechat?

We do not, but would love to hear back!

Please also see: contig level gather, #3095, which is being worked on (albeit back burner),

Secondly, does anyone have experience with using translated long reads with sourmash-sketched protein databases for taxonomic assignment? Do you expect this to improve taxonomic assignment?

Mmmh, I do not have good intuition here. I could see it going either way:

smaller k-sizes + increased sensitivity of protein matches across long evolutionary distances => improvement
more matches, including spurious ones, and increased computational costs associated with more matches => degradation

@bluegenes thoughts?

In relation to this, I see that base sourmash currently supports translated sketching. Is this a feature you are considering adding to the branchwater plugin?

It is not so far away - sourmash-bio/sourmash_plugin_branchwater#262 and sourmash-bio/sourmash_plugin_branchwater#520 - so if we had a good reason it would be straightforward to add.

Such a reason might be you finding that it works well in small test circumstances and now you want to scale up... :)

bluegenes · 2025-01-16T19:29:59Z

First, thanks for such a great tool! Sourmash (especially with the branchwater plugin) is incredibly impressive.

Thank you! Flattery will get you many places :)

Firstly, I'm interested in using sourmash for ONT metagenome taxonomic assignment. From the paper "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets", sourmash seems to struggle a bit with ONT. If the root issue here is low read accuracy, does anyone have any experience with pre-processing using read error correction tools such as herro or dechat?

We do not, but would love to hear back!

I really think this should help, but haven't tried it yet. I also expect newer Nanopore datasets will work better b/c of accuracy improvements over time.

Please also see: contig level gather, #3095, which is being worked on (albeit back burner),

Secondly, does anyone have experience with using translated long reads with sourmash-sketched protein databases for taxonomic assignment? Do you expect this to improve taxonomic assignment?

Mmmh, I do not have good intuition here. I could see it going either way:

smaller k-sizes + increased sensitivity of protein matches across long evolutionary distances => improvement

more matches, including spurious ones, and increased computational costs associated with more matches => degradation

@bluegenes thoughts?

What specificity are you looking for? From testing a while back, protein classification for microbes seemed to work well at the genus level, but may not be able to properly find the best species match within there.

In relation to this, I see that base sourmash currently supports translated sketching. Is this a feature you are considering adding to the branchwater plugin?

It is not so far away - sourmash-bio/sourmash_plugin_branchwater#262 and sourmash-bio/sourmash_plugin_branchwater#520 - so if we had a good reason it would be straightforward to add.

Such a reason might be you finding that it works well in small test circumstances and now you want to scale up... :)

peterdoug · 2025-01-23T09:03:30Z

Thanks a lot for your replies! I have unfortunately not had the time to test this yet, but I'll let let you know when I do:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourmash ONT taxonomy questions #3480

Sourmash ONT taxonomy questions #3480

peterdoug commented Jan 10, 2025

ctb commented Jan 12, 2025

bluegenes commented Jan 16, 2025 •

edited

Loading

peterdoug commented Jan 23, 2025

Sourmash ONT taxonomy questions #3480

Sourmash ONT taxonomy questions #3480

Comments

peterdoug commented Jan 10, 2025

ctb commented Jan 12, 2025

bluegenes commented Jan 16, 2025 • edited Loading

peterdoug commented Jan 23, 2025

bluegenes commented Jan 16, 2025 •

edited

Loading