Should sequences from priority/target species be added to the DADA2 priors? #18

jackscanlan · 2024-08-02T03:56:10Z

jackscanlan
Aug 2, 2024
Maintainer

Wondering your thoughts on this, @alexpiper: given a key purpose of the pipeline is to very sensitively detect specific species, would be possible/worth adding these sequences to the DADA2 priors before doing the second round of denoising?

I can imagine a scenario where a target species (A) has a very similar sequence (perhaps a single nucleotide difference) to a non-target (B), and A is at a real but very low abundance in a sample, while B is at a very high abundance. Is there a risk that DADA2 would treat the A sequence as an error-child of B and end up removing A from that sample? Would adding the A sequence to the priors, even if it otherwise didn't qualify, improve sensitivity? Or would this just lead to inflated false-positive detections?

alexpiper · 2024-08-02T05:05:48Z

alexpiper
Aug 2, 2024
Maintainer

Great question, and it could potentially be an issue when there is only a single mutation seperating different species (This is the case with some of the Tephritid species we are targetting). I think anything that increases sensitivity for target species is worth exploring, and helps consolidate the 'niche' of this pipeline.

While it could also conceivably lead to false-positives, my intuition is this would only occur if a PCR error in an innocuous sequence just happens to occur at the right position to make it match the 'target' prior, which would be pretty rare. And my philosophy here is its better to be more sensitive and follow up a few (false)positive detections, than miss a target species altogether.

I remembered starting to look into this years ago, and found this old (unfortunately incomplete and not very informative) RMD https://github.com/alexpiper/Drosophila_metabarcoding/blob/4e2a5e51081e3129de96006848eebfbc9d5443c7/priors.Rmd

At the time i was testing it on a real (novaseq) dataset of drosophila traps, and I think my findings were that including targets as priors didnt have any noticeable impact (at least on that dataset). But it was a very rough analysis and would be worth doing again more robustly and potentially with a simulated dataset.

Theres a couple of annoyances with how these priors are handled, i.e. for paired end data there has to be a seperate prior for forward and reverse sequences see: benjjneb/dada2#883 so the 'target' sequences provided by the user need to be further processed before use as priors.

On the topic of priors and the 2-step ASV inference we are currently doing, i think it does increase the overall number of spurious sequences that makes it through DADA2. I wonder if filtering the ASVs (for chimeras, pseudogenes etc) from the first round of DADA2 before using them in the second round might also be worth exploring?

1 reply

jackscanlan Nov 14, 2024
Maintainer Author

On the topic of priors and the 2-step ASV inference we are currently doing, i think it does increase the overall number of spurious sequences that makes it through DADA2. I wonder if filtering the ASVs (for chimeras, pseudogenes etc) from the first round of DADA2 before using them in the second round might also be worth exploring?

If you don't do a second round of denoising, you'd be filtering those sequences anyway, so there's no probably no downside to doing this (other than maybe computational efficiency -- although what step is more intense: filtering or denoising?), but I wonder if spurious priors would just get picked up in the normal filtering? Unless they're spurious in ways our filtering can't detect, in which case pre-filtering of the priors probably won't do anything.

alexpiper · 2024-08-02T05:13:02Z

alexpiper
Aug 2, 2024
Maintainer

See explicit recommendation on using priors for this here: benjjneb/dada2#1195

1 reply

jackscanlan Nov 14, 2024
Maintainer Author

This seems like a good enough reason to implement this! I'll make an issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should sequences from priority/target species be added to the DADA2 priors? #18

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Should sequences from priority/target species be added to the DADA2 priors? #18

jackscanlan Aug 2, 2024 Maintainer

Replies: 2 comments · 2 replies

alexpiper Aug 2, 2024 Maintainer

jackscanlan Nov 14, 2024 Maintainer Author

alexpiper Aug 2, 2024 Maintainer

jackscanlan Nov 14, 2024 Maintainer Author

jackscanlan
Aug 2, 2024
Maintainer

Replies: 2 comments 2 replies

alexpiper
Aug 2, 2024
Maintainer

jackscanlan Nov 14, 2024
Maintainer Author

alexpiper
Aug 2, 2024
Maintainer

jackscanlan Nov 14, 2024
Maintainer Author