Should sequences from priority/target species be added to the DADA2 priors? #18
Replies: 2 comments 2 replies
-
Great question, and it could potentially be an issue when there is only a single mutation seperating different species (This is the case with some of the Tephritid species we are targetting). I think anything that increases sensitivity for target species is worth exploring, and helps consolidate the 'niche' of this pipeline. While it could also conceivably lead to false-positives, my intuition is this would only occur if a PCR error in an innocuous sequence just happens to occur at the right position to make it match the 'target' prior, which would be pretty rare. And my philosophy here is its better to be more sensitive and follow up a few (false)positive detections, than miss a target species altogether. I remembered starting to look into this years ago, and found this old (unfortunately incomplete and not very informative) RMD https://github.com/alexpiper/Drosophila_metabarcoding/blob/4e2a5e51081e3129de96006848eebfbc9d5443c7/priors.Rmd At the time i was testing it on a real (novaseq) dataset of drosophila traps, and I think my findings were that including targets as priors didnt have any noticeable impact (at least on that dataset). But it was a very rough analysis and would be worth doing again more robustly and potentially with a simulated dataset. Theres a couple of annoyances with how these priors are handled, i.e. for paired end data there has to be a seperate prior for forward and reverse sequences see: benjjneb/dada2#883 so the 'target' sequences provided by the user need to be further processed before use as priors. On the topic of priors and the 2-step ASV inference we are currently doing, i think it does increase the overall number of spurious sequences that makes it through DADA2. I wonder if filtering the ASVs (for chimeras, pseudogenes etc) from the first round of DADA2 before using them in the second round might also be worth exploring? |
Beta Was this translation helpful? Give feedback.
-
See explicit recommendation on using priors for this here: benjjneb/dada2#1195 |
Beta Was this translation helpful? Give feedback.
-
Wondering your thoughts on this, @alexpiper: given a key purpose of the pipeline is to very sensitively detect specific species, would be possible/worth adding these sequences to the DADA2 priors before doing the second round of denoising?
I can imagine a scenario where a target species (A) has a very similar sequence (perhaps a single nucleotide difference) to a non-target (B), and A is at a real but very low abundance in a sample, while B is at a very high abundance. Is there a risk that DADA2 would treat the A sequence as an error-child of B and end up removing A from that sample? Would adding the A sequence to the priors, even if it otherwise didn't qualify, improve sensitivity? Or would this just lead to inflated false-positive detections?
Beta Was this translation helpful? Give feedback.
All reactions