-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing 16S gene identifiers #734
Comments
Hi there, I actually never looked into that. Therefore I am not aware of a solution or whether it could be implemented. Maybe someone else could chime in. For more context, could you describe why that would help you/what you would gain? E.g. whats your usecase? Might that be of more general interest as well? |
Thanks for the swift reply! The reason for the request is that while comparing 16S results from databases like SILVA, with whole genome-based methods (with NCBI Taxonomy used), one stumbles upon a problem of outdated taxonomic labels of SILVA and discrepansies between OTUs and genomic taxonomy. Since SILVA stores information on 16S gene primary accession in GenBank (and from what I see some of the ampliseq database files do that too), it is possible to use it for finding what's the NCBI taxonomy assigned to the gene - which is likely more up to date and in line with whole genomes' taxonomy. |
I see. The outdated taxonomies could be probably improved that way. |
Great recommendations! I've launched GTDB-based classification with DADA2, however I see some crucial taxa are not detected further than phylum/class/family level, even though genus/species level gets assigned to the same ASV with SILVA. It seems to me that the GTDB database have a different content on the sequence level than the default SILVA, and hence the classification results differ? Would it be the same case for Kraken2 database? |
Databases are non-trivial to compare, so if you do not find a "crucial" taxa, turn to another one.
I dont know, you would need to test. |
IMO, "sbdi-gtdb" is better than "gtdb" as we know there are rRNA-sequences in the GTDB collection that are assigned to the wrong species. "sbdi-gtdb" is phylogenetically vetted to remove these. |
Thanks for all the suggestions - indeed I gave I didn't expect it to be such a challenge to benchmark the technologies, looks like it requires a lot of manual research to map the taxonomic labels correspondence, otherwise while plotted one next to another the data looks like the results were completely different. |
I am benchmarking currently, hope that will shed some light on it. |
Thank you so much Daniel! Now that I think about it, it may be a matter of different GTDB versions? For full-genome methods, we use the latest 214 release. As far as I can see, SBDI is tied to 207 release, which means I see Bacillota_A in shotgun, but Firmicutes_A in ampliseq results - the major phyla names change is probably not accounted for in v 207? |
That's it. I'm working on SBDI-GTDB 08RS214, and soon release 09 (when that's released, likely in late April). |
That's precious. If I set up the repository to track the releases, will it be enough to be notified when it becomes available? |
New releases of databases are included in new releases of the pipeline itself, so yes. Hopefully, I'm done with the next release in time for Ampliseq 2.10. |
That would be fantastic - thanks for all the work, I'm hitting the |
Description of feature
As one of the input database files (silva_species_assignment_v138.1.fa.gz) stores an information about the 16S gene primary accession in GenBank database, I'm looking for the best way to track back those accessions next to the taxonomy assigned to an ASV.
Is there already a solution for tracking that, or could it be implemented?
Thanks in advance!
The text was updated successfully, but these errors were encountered: